intrepidcs / intrepid-socketcan-kernel-module

Kernel-mode SocketCAN module for Intrepid devices
17 stars 7 forks source link

Exception occurs while booting on OrangePI One #5

Closed tysonite closed 5 years ago

tysonite commented 5 years ago

We observe Linux/OrangePI freezes (not accessible via SSH, even via keyboard) after several days of running. The suspect is CAN driver that reports exception on boot up on 4.19.50 kernel. On 4.13.15 kernel no such exceptions during boot time.

Orange PI One used as hardware platform. Armbian Linux (Armbian_5.89_Orangepione_Ubuntu_xenial_next_4.19.50) distribution (built by us, not binary distributive from official web site) used to work with ValueCAN3/4 device.

Intrepid CAN Sw versions:


https://github.com/intrepidcs/intrepid-socketcan-kernel-module  7c2583c938c74f11e99f005a7cbab4802c9a3b46
https://github.com/intrepidcs/icsscand  318af6b4f395cb2ba97e7d20b57d0944c0c905f5
https://github.com/intrepidcs/icsneoapi  694bf94eeaed933aabc6ea417be5d1cfe32d06a1

Kernel call stack:

[   13.444139] ------------[ cut here ]------------
[   13.444168] WARNING: CPU: 2 PID: 869 at net/core/dev.c:1746 call_netdevice_notifiers_info+0x53/0x58
[   13.444172] RTNL: assertion failed at net/core/dev.c (1746)
[   13.444176] Modules linked in: snd_soc_simple_card sun8i_codec_analog snd_soc_simple_card_utils sun8i_adda_pr_regmap snd_soc_hdmi_codec sun4i_i2s snd_usb_audio snd_hwdep snd_usbmidi_lib snd_soc_core evdev snd_rawmidi snd_pcm_dmaengine sun4i_gpadc_iio snd_pcm ftdi_sio snd_timer industrialio cdc_acm snd asix soundcore usbnet sun8i_ths sunxi phy_generic musb_hdrc gpio_keys cpufreq_dt uio_pdrv_genirq thermal_sys uio mxu11x0(O) usbserial intrepid(O) can_dev can_raw can uas lima gpu_sched dw_hdmi_cec dw_hdmi_i2s_audio ttm
[   13.444305] CPU: 2 PID: 869 Comm: icsscand Tainted: G           O      4.19.50-sunxi #5.89
[   13.444309] Hardware name: Allwinner sun8i Family
[   13.444351] [<c010d745>] (unwind_backtrace) from [<c010a2f1>] (show_stack+0x11/0x14)
[   13.444369] [<c010a2f1>] (show_stack) from [<c08efb01>] (dump_stack+0x69/0x78)
[   13.444385] [<c08efb01>] (dump_stack) from [<c011b25d>] (__warn+0xa1/0xb4)
[   13.444397] [<c011b25d>] (__warn) from [<c011b2a3>] (warn_slowpath_fmt+0x33/0x48)
[   13.444411] [<c011b2a3>] (warn_slowpath_fmt) from [<c07d525b>] (call_netdevice_notifiers_info+0x53/0x58)
[   13.444426] [<c07d525b>] (call_netdevice_notifiers_info) from [<c07dd137>] (dev_set_mtu_ext+0x5f/0x110)
[   13.444438] [<c07dd137>] (dev_set_mtu_ext) from [<c07dd217>] (dev_set_mtu+0x2f/0x60)
[   13.444455] [<c07dd217>] (dev_set_mtu) from [<bf86e94f>] (intrepid_dev_ioctl+0x402/0x4d4 [intrepid])
[   13.444479] [<bf86e94f>] (intrepid_dev_ioctl [intrepid]) from [<c024d399>] (do_vfs_ioctl+0x8d/0x6b4)
[   13.444491] [<c024d399>] (do_vfs_ioctl) from [<c024da07>] (ksys_ioctl+0x47/0x50)
[   13.444503] [<c024da07>] (ksys_ioctl) from [<c0101001>] (ret_fast_syscall+0x1/0x62)
[   13.444507] Exception stack(0xd47c3fa8 to 0xd47c3ff0)
[   13.444516] 3fa0:                   b67e8c28 b67e8bd0 00000003 00003001 b67e8cf8 00076950
[   13.444526] 3fc0: b67e8c28 b67e8bd0 b5e3d3e0 00000036 b67e8bb8 000766c4 0000000c b67e8ba0
[   13.444532] 3fe0: 00076128 b67e89dc 00031519 b6d1be06
[   13.444539] ---[ end trace ef0b90aa2f23df40 ]---
[   13.444547] ------------[ cut here ]------------
$ uname -a
Linux arm_orangepi 4.19.50-sunxi #5.89 SMP Fri Jun 14 01:50:58 EDT 2019 armv7l armv7l armv7l GNU/Linux

Default kernel config for next Armbian branch + following options:

# Modules configuration
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y

# Can protocol support
CONFIG_CAN=m
CONFIG_CAN_8DEV_USB=m
CONFIG_CAN_BCM=m
CONFIG_CAN_CALC_BITTIMING=y
CONFIG_CAN_DEV=m
CONFIG_CAN_EMS_USB=m
CONFIG_CAN_ESD_USB2=m
CONFIG_CAN_GS_USB=m
CONFIG_CAN_GW=m
CONFIG_CAN_HI311X=m
CONFIG_CAN_KVASER_USB=m
CONFIG_CAN_LEDS=y
CONFIG_CAN_MCBA_USB=m
CONFIG_CAN_MCP251X=m
CONFIG_CAN_PEAK_USB=m
CONFIG_CAN_RAW=m
CONFIG_CAN_SLCAN=m
CONFIG_CAN_SUN4I=m
CONFIG_CAN_VCAN=m
CONFIG_CAN_VXCAN=m
CONFIG_NET_EMATCH_CANID=m

# Otg gadget support
CONFIG_GADGET_UAC1=y
CONFIG_USB_AUDIO=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_USB_CONFIGFS_ACM=y
CONFIG_USB_CONFIGFS_ECM_SUBSET=y
CONFIG_USB_CONFIGFS_ECM=y
CONFIG_USB_CONFIGFS_EEM=y
CONFIG_USB_CONFIGFS_F_FS=y
CONFIG_USB_CONFIGFS_F_HID=y
CONFIG_USB_CONFIGFS_F_LB_SS=y
CONFIG_USB_CONFIGFS_F_MIDI=y
CONFIG_USB_CONFIGFS_F_PRINTER=y
CONFIG_USB_CONFIGFS_F_TCM=y
CONFIG_USB_CONFIGFS_F_UAC1=y
CONFIG_USB_CONFIGFS_F_UAC2=y
CONFIG_USB_CONFIGFS_F_UVC=y
CONFIG_USB_CONFIGFS=m
CONFIG_USB_CONFIGFS_MASS_STORAGE=y
CONFIG_USB_CONFIGFS_NCM=y
CONFIG_USB_CONFIGFS_OBEX=y
CONFIG_USB_CONFIGFS_RNDIS=y
CONFIG_USB_CONFIGFS_SERIAL=y
CONFIG_USB_ETH_EEM=y
CONFIG_USB_ETH=m
CONFIG_USB_ETH_RNDIS=y
CONFIG_USB_F_ACM=m
CONFIG_USB_F_ECM=m
CONFIG_USB_F_EEM=m
CONFIG_USB_F_FS=m
CONFIG_USB_F_HID=m
CONFIG_USB_F_MASS_STORAGE=m
CONFIG_USB_F_MIDI=m
CONFIG_USB_F_NCM=m
CONFIG_USB_F_OBEX=m
CONFIG_USB_F_PRINTER=m
CONFIG_USB_F_RNDIS=m
CONFIG_USB_F_SERIAL=m
CONFIG_USB_F_SS_LB=m
CONFIG_USB_F_SUBSET=m
CONFIG_USB_F_TCM=m
CONFIG_USB_F_UAC1=m
CONFIG_USB_F_UAC2=m
CONFIG_USB_FUNCTIONFS_ETH=y
CONFIG_USB_FUNCTIONFS_GENERIC=y
CONFIG_USB_FUNCTIONFS=m
CONFIG_USB_FUNCTIONFS_RNDIS=y
CONFIG_USB_F_UVC=m
CONFIG_USB_G_ACM_MS=m
CONFIG_USB_GADGETFS=m
CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2
CONFIG_USB_GADGET_TARGET=m
CONFIG_USB_GADGET_VBUS_DRAW=2
CONFIG_USB_GADGET=y
CONFIG_USB_G_HID=m
CONFIG_USB_G_MULTI_CDC=y
CONFIG_USB_G_MULTI=m
CONFIG_USB_G_MULTI_RNDIS=y
CONFIG_USB_G_NCM=m
CONFIG_USB_G_PRINTER=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_G_WEBCAM=m
CONFIG_USBIP_VUDC=m
CONFIG_USB_LIBCOMPOSITE=m
CONFIG_USB_MASS_STORAGE=m
CONFIG_USB_MIDI_GADGET=m
CONFIG_USB_SNP_CORE=m
CONFIG_USB_SNP_UDC_PLAT=m
CONFIG_USB_U_AUDIO=m
CONFIG_USB_U_ETHER=m
CONFIG_USB_U_SERIAL=m
CONFIG_USB_ZERO=m
CONFIG_U_SERIAL_CONSOLE=y

# Otg dual role configuration
CONFIG_USB_OTG=y
CONFIG_USB_MUSB_HDRC=m
CONFIG_USB_MUSB_DUAL_ROLE=y
CONFIG_USB_MUSB_SUNXI=m
CONFIG_MUSB_PIO_ONLY=y
CONFIG_USB_PHY=y
CONFIG_NOP_USB_XCEIV=m

# Additional configuration
CONFIG_MEMTEST=y

The script used to build Armbian:

#!/bin/bash

_ARMBIAN_BRANCH=next
_ARMBIAN_BOARD=orangepione
_ARMBIAN_RELEASE=xenial
_ARMBIAN_BUILD_DESKTOP=no
_ARMBIAN_EXTERNAL=yes
_ARMBIAN_EXTERNAL_NEW=prebuilt
_ARMBIAN_INSTALL_HEADERS=yes
_ARMBIAN_KERNEL_ONLY=no
_ARMBIAN_KERNEL_KEEP_CONFIG=no
_ARMBIAN_KERNEL_CONFIGURE=no
_ARMBIAN_LINUXCONFIG=pasa_orangepi
# KERNELBRANCH="KERNELBRANCH='branch:linux-4.13.y'"

RUN_PATH="$(dirname "$(realpath "${BASH_SOURCE}")")"
SRC="$RUN_PATH/../build"
KERNEL_PATH="$SRC/config/kernel"
SCRIPT_PATH="$SRC/userpatches/customize-image.sh"
CONFIG_PATH="$SRC/userpatches/lib.config"

ROOT_PWD="vbu1"
SEARCH_STR="# your code here"
CODE_SHIFT="                        "
CODE_LINE0="(echo $ROOT_PWD;echo $ROOT_PWD;) | passwd root >/dev/null 2>\&1"
CODE_LINE1="chage -d 99999 root"
CODE_LINE2="touch /root/.no_rootfs_resize"
CODE_LINE3="rm /root/.not_logged_in_yet"
# CODE_LINE4=''
# CODE_LINE5='SCRIPT_PATH="/etc/ssh/sshd_config"'
# CODE_LINE6='SEARCH_STR="UsePAM"'
# CODE_LINE7='FOUND_STR="$(cat $SCRIPT_PATH | grep $SEARCH_STR)"'
# CODE_LINE8='CODE_LINE="$SEARCH_STR no"'
# CODE_LINE9='sed -i "s/$FOUND_STR/$CODE_LINE/" $SCRIPT_PATH'

rm -rf $SRC
git clone https://github.com/armbian/build $SRC

mkdir -p $SRC/userpatches/
cp $SRC/config/templates/customize-image.sh.template $SCRIPT_PATH
for (( i=3; i >= 0; i-- ))
do
    CODE_LINE="CODE_LINE$i"
    sed -i "s/$SEARCH_STR/$SEARCH_STR\n$CODE_SHIFT${!CODE_LINE//"/"/"\/"}/" $SCRIPT_PATH
done

cat $KERNEL_PATH/linux-sunxi-next.config $RUN_PATH/pasa_kernel.config > \
$KERNEL_PATH/pasa_orangepi.config
echo $KERNELBRANCH > $CONFIG_PATH

BUILD_VARS=$(set -o posix && set | grep _ARMBIAN_ | sed "s/_ARMBIAN_//")
source $SRC/compile.sh $BUILD_VARS
hollinsky-intrepid commented 5 years ago

Hi tysonite,

I've fixed the cause of the stack trace you posted with a9233ab3. Let us know if this solves your stability issue!

--Paul

tysonite commented 5 years ago

Thanks, @hollinsky-intrepid, we are going to update to the latest revision and see if it helps. Usually it takes around 1-2 weeks to see instability.

hollinsky-intrepid commented 5 years ago

Closing this issue for the time being, in that case. Feel free to re-open if the problem appears again.