intrepidcs / intrepid-socketcan-kernel-module

Kernel-mode SocketCAN module for Intrepid devices
17 stars 6 forks source link

Stack trace in kernel log #8

Closed tysonite closed 4 years ago

tysonite commented 4 years ago

The following error message logged into dmesg:

[   14.992744] WARNING: CPU: 1 PID: 1229 at /orangepi-setup/intrepid-socketcan-kernel-module/intrepid.c:437 intrepid_dev_ioctl+0x1ae/0x3a0 [intrepid]
[   14.992749] intrepid: Dropping message on ics0can0: backlog full
[   14.992752] Modules linked in: bnep btusb btrtl btbcm btintel bluetooth ecdh_generic evdev snd_soc_hdmi_codec sun8i_codec_analog sun4i_gpadc_iio sun4i_i2s sunxi phy_generic musb_hdrc snd_soc_simple_card snd_soc_simple_card_utils snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore uio_pdrv_genirq uio 8812au cfg80211 asix usbnet ftdi_sio rfkill mxu11x0(O) usbserial intrepid(O) can_dev can_raw can dw_hdmi_cec dw_hdmi_i2s_audio sun8i_dw_hdmi dw_hdmi cec sun4i_tcon sun8i_mixer sun4i_drm
[   14.992884] CPU: 1 PID: 1229 Comm: icsscand Tainted: G           O    4.13.15-sunxi #1
[   14.992888] Hardware name: Allwinner sun8i Family
[   14.992915] [<c010da59>] (unwind_backtrace) from [<c010a019>] (show_stack+0x11/0x14)
[   14.992926] [<c010a019>] (show_stack) from [<c0828021>] (dump_stack+0x69/0x78)
[   14.992938] [<c0828021>] (dump_stack) from [<c011a2bb>] (__warn+0xaf/0xc0)
[   14.992948] [<c011a2bb>] (__warn) from [<c011a2ff>] (warn_slowpath_fmt+0x33/0x48)
[   14.992964] [<c011a2ff>] (warn_slowpath_fmt) from [<bf86e5eb>] (intrepid_dev_ioctl+0x1ae/0x3a0 [intrepid])
[   14.992991] [<bf86e5eb>] (intrepid_dev_ioctl [intrepid]) from [<c0231615>] (do_vfs_ioctl+0x91/0x6f0)
[   14.993006] [<c0231615>] (do_vfs_ioctl) from [<c0231cbb>] (SyS_ioctl+0x47/0x50)
[   14.993020] [<c0231cbb>] (SyS_ioctl) from [<c0106601>] (ret_fast_syscall+0x1/0x4c)
[   14.993025] ---[ end trace 004f7b03558f60bb ]---

The kernel module revision is d8ea05895e3fc1ec16f0444401f763beb20b6fdd

hollinsky commented 4 years ago

Hi tysonite,

This message can occur for a few reasons. The most common is that the kernel will drop messages received while the interface is down. Another would be if we really were going too fast and filling the receive buffer, but this is unlikely and I've never seen it happen.

What has lead you to use this older version? In the version you are using, we were warning once about this and printing a stacktrace. You wouldn't be warned about it happening again. Now, we don't trigger a stacktrace as I don't believe it makes sense for this warning, and we also will warn again for each message dropped.

While starting up, if there's bus load and you're not immediately bringing the interface up, this is an expected message. The message is here so that, if we were to receive reports of performance issues, and we saw this message appearing while networks are up and we're receiving messages, we'd know where to start investigating. The equivalent message you'll see on v2.0.2 is "intrepid: Dropping message on can0, dropped by kernel".

tysonite commented 4 years ago

Hi @hollinsky,

Thanks for your feedback. Probably your are right about your guess that interface wasn't brought up at a time we see this message. However, after that stack trace / warning, the system under test don't come up - there could be several reasons of that (e.g. peripheral hardware issues, ...), but I would like to exclude CAN driver issue as system under test boots up via CAN. Can you please suggest a way to sniff CAN bus on OrangePI side? Any tools available for that?

hollinsky-intrepid commented 4 years ago

You'll want to install the can-utils package, which includes the candump program. Then candump ics0can0 or candump any will start dumping frames to the terminal. You can also use cansend to transmit.

tysonite commented 4 years ago

Thanks. Everything looks to be good from candump or cansend perspective.

One unrelated questions:

ics0can0  Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:96985 errors:0 dropped:76 overruns:0 frame:0
          TX packets:8 errors:2 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:484949 (484.9 KB)  TX bytes:8 (8.0 B)

ics0can1  Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:2451686 errors:0 dropped:311 overruns:0 frame:0
          TX packets:787624 errors:1727 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:13871916 (13.8 MB)  TX bytes:8 (8.0 B)

Both interfaces show some errors / dropped packets. Is it possible to figure out why this is happening? I suppose it just because of some buffers that are not big enough.

tysonite commented 4 years ago

@hollinsky-intrepid, I appreciate any suggestions or proposals...

hollinsky-intrepid commented 4 years ago

My apologies, I missed the previous comment. We increment the interface's TX error count any time the TEC changes (transmit error counter of the CAN controller). This can occur if there's electrical interference impacting the bus, collisions, etc. It will also occur when the device is attempting to transmit and no nodes are acknowledging it (i.e. if the device is not connected to a bus, or if it is the only node on the bus).

It will also be telling if the error counter only increases at startup, when a node is plugged/unplugged, or if errors are periodically being generated during runtime.

I would not expect that you are overrunning any buffers unless this is an extremely high traffic scenario, but it can happen. I would rule out a software bug, but it is harder for me to say since I can see you're running old deprecated versions of the software.

It would help to know more about what you're trying to do so we can get an idea of which error scenarios are the most likely. If it's something you'd prefer not to talk about in the open, you could also email support@intrepidcs.com.

tysonite commented 4 years ago

Does your answer apply also to dropped packets?

We have a plan to use fresh software earlier next week, I will return back if necessary. Thank you in advance.

hollinsky-intrepid commented 4 years ago

With the latest version, every increment of the drop counter also logs a message to dmesg with more information about the drop. I believe the older versions would only warn on the first one, this has since been fixed.

99% of the time, a packet is being dropped because the interface link state is set to down (right when it's created, before it's set to up by the user). You'd see "dropped by kernel" in the log message if that's the case. I have not seen dropped packets after the initial startup.