ikwzm / FPGA-SoC-Linux

FPGA+SoC+Linux+Device Tree Overlay+FPGA Manager U-Boot&Linux Kernel&Debian11 Images (for Xilinx:Zynq-Zybo:PYNQ-Z1 Altera:de0-nano-soc:de10-nano)
158 stars 57 forks source link

Kernel Oops when loading altera-hps2fpga.rb #2

Closed FPtje closed 7 years ago

FPtje commented 7 years ago

Kernel:

Linux lumi-sign-fpgatest 4.8.17 #1-NixOS SMP Mon Jan 9 07:22:35 UTC 2017 armv7l GNU/Linux

Device: Altera de0-nano-soc

Revision: 17c09587b5cd797b314dcdcfc7a4c144e3b645e3

Immediately after issuing altera-hps2fpga.rb --install I get the following error:

[   33.571895] Unable to handle kernel paging request at virtual address 6572336c
[   33.579160] pgd = eca0c000
[   33.581875] [6572336c] *pgd=00000000
[   33.585486] Internal error: Oops: 5 [#1] SMP ARM
[   33.590090] Modules linked in: af_packet xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf
_defrag_ipv4 xt_conntrack ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_fil
ter fpgacfg(O) dtbocfg(O) configfs udmabuf(O) zptty(O) snd_pcm_oss snd_pcm snd_timer snd soundcore nf_conntrack_ftp nf_conntrack ip_tables x_tables ipv6
[   33.635470] CPU: 1 PID: 939 Comm: ruby Tainted: G           O    4.8.17 #1-NixOS
[   33.642839] Hardware name: Altera SOCFPGA
[   33.646836] task: ecb585c0 task.stack: ecb9c000
[   33.651360] PC is at __kmalloc_track_caller+0x94/0x28c
[   33.656481] LR is at 0xecb9dd00
[   33.659614] pc : [<c04a2bb0>]    lr : [<ecb9dd00>]    psr: 200f0013
[   33.659614] sp : ecb9dd00  ip : ecb9dd00  fp : ecb9dd44
[   33.671046] r10: ee801e40  r9 : c17be4b8  r8 : c0c0abcc
[   33.676253] r7 : 024000c0  r6 : ee801e40  r5 : 00000004  r4 : 6572336c
[   33.682761] r3 : 00000000  r2 : c156dbf0  r1 : 2e173000  r0 : ee801e40
[   33.689266] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   33.696385] Control: 10c5387d  Table: 2ca0c04a  DAC: 00000051
[   33.702117] Process ruby (pid: 939, stack limit = 0xecb9c220)
[   33.707847] Stack: (0xecb9dd00 to 0xecb9e000)
[   33.712200] dd00: c05277ec c06b7bc0 ee8b0510 0000a1af ecb9dd44 ecb9dd20 c052849c 024000c0
[   33.720362] dd20: 00000004 eca94840 eef8862c eef8862c ee9b7000 eca928c0 ecb9dd64 ecb9dd48
[   33.728524] dd40: c0469494 c04a2b28 eca94840 00000001 eca94840 eef8862c ecb9dd84 ecb9dd68
[   33.736686] dd60: c0c0abcc c0469464 eca94840 eef88600 eca94800 ffffffff ecb9ddb4 ecb9dd88
[   33.744848] dd80: c0c0c790 c0c0ab44 00000001 c0c0c89c 00000000 eca94800 eef88600 eca98680
[   33.753010] dda0: eca928d8 00000018 ecb9ddd4 ecb9ddb8 c0c0c8bc c0c0c720 eca94980 800f0013
[   33.761172] ddc0: 00000000 eca928d8 ecb9ddfc ecb9ddd8 c0c0f8b8 c0c0c810 ecb9de04 eca93a00
[   33.769333] dde0: c0c0b358 eca94980 eca92814 00000000 ecb9de34 ecb9de00 c0c0fc2c c0c0f6c0
[   33.777495] de00: c0c14cf4 c0c0b318 eca928c0 00000004 eca92800 eca928d8 eca928d8 00000004
[   33.785657] de20: eca92800 eca928d8 ecb9de84 ecb9de38 c0c15040 c0c0fc08 024000c0 c0c0edb8
[   33.793819] de40: 00000000 eca92804 00000001 eca92814 edea0348 ecb9de60 c0c10658 00000001
[   33.801981] de60: eef88000 0029ff88 eca927d8 eef00600 ecb9df78 00000051 ecb9dea4 ecb9de88
[   33.810144] de80: bf13f210 c0c14df4 ecb9dea4 00000001 00000001 eca927c0 ecb9ded4 ecb9dea8
[   33.818305] dea0: bf12f974 bf13f168 ecb9df78 eef00600 bf12f884 ecb9df78 00000001 0029ff88
[   33.826466] dec0: 00000001 00000000 ecb9df44 ecb9ded8 c04b41d4 bf12f890 00000817 c0d85068
[   33.834628] dee0: 002a1f8c ecb9dfb0 00002710 000001ff ecb9dfac ecb9df00 c0301230 c0d85074
[   33.842789] df00: eef00608 ecb9df10 c04d5084 ee9b7000 eef00600 c04b5070 c2546e10 c04b86f8
[   33.850951] df20: 00000001 eef00600 0029ff88 ecb9df78 0029ff88 00000001 ecb9df74 ecb9df48
[   33.859112] df40: c04b50b0 c04b41a4 eef00603 b6e22a1c ecb9df74 eef00603 eef00600 00000000
[   33.867274] df60: 00000000 0029ff88 ecb9dfa4 ecb9df78 c04b5ec8 c04b5008 00000000 00000000
[   33.875435] df80: 0029ff08 b6e22a1c 0029ff08 00000004 c0309764 ecb9c000 00000000 ecb9dfa8
[   33.883597] dfa0: c03095a0 c04b5e88 0029ff08 b6e22a1c 00000007 0029ff88 00000001 00000000
[   33.891758] dfc0: 0029ff08 b6e22a1c 0029ff08 00000004 00000001 00022c78 00000882 bec20fb4
[   33.899919] dfe0: 00000000 bec20d00 b6ff7880 b6d83338 800f0010 00000007 00000000 00000000
[   33.908101] [<c04a2bb0>] (__kmalloc_track_caller) from [<c0469494>] (kstrdup+0x3c/0x68)
[   33.916106] [<c0469494>] (kstrdup) from [<c0c0abcc>] (safe_name+0x94/0xb8)
[   33.922974] [<c0c0abcc>] (safe_name) from [<c0c0c790>] (__of_add_property_sysfs+0x7c/0xf0)
[   33.931228] [<c0c0c790>] (__of_add_property_sysfs) from [<c0c0c8bc>] (__of_attach_node_sysfs+0xb8/0xf8)
[   33.940607] [<c0c0c8bc>] (__of_attach_node_sysfs) from [<c0c0f8b8>] (__of_changeset_entry_apply+0x204/0x290)
[   33.950416] [<c0c0f8b8>] (__of_changeset_entry_apply) from [<c0c0fc2c>] (__of_changeset_apply+0x30/0xd0)
[   33.959882] [<c0c0fc2c>] (__of_changeset_apply) from [<c0c15040>] (of_overlay_create+0x258/0x318)
[   33.968744] [<c0c15040>] (of_overlay_create) from [<bf13f210>] (dtbocfg_overlay_item_status_store+0xb4/0x110 [dtbocfg])
[   33.979522] [<bf13f210>] (dtbocfg_overlay_item_status_store [dtbocfg]) from [<bf12f974>] (configfs_write_file+0xf0/0x19c [configfs])
[   33.991424] [<bf12f974>] (configfs_write_file [configfs]) from [<c04b41d4>] (__vfs_write+0x3c/0x128)
[   34.000545] [<c04b41d4>] (__vfs_write) from [<c04b50b0>] (vfs_write+0xb4/0x190)
[   34.007846] [<c04b50b0>] (vfs_write) from [<c04b5ec8>] (SyS_write+0x4c/0xa0)
[   34.014889] [<c04b5ec8>] (SyS_write) from [<c03095a0>] (ret_fast_syscall+0x0/0x3c)
[   34.022449] Code: e7924001 e3540000 0a000067 e5963014 (e7943003)
[   34.028634] ---[ end trace 5fb0fc71c464fffb ]---

I don't know what to do with this. I'll try different versions, but otherwise I have no idea.

Edit: It also happens for 7b166af5be65da271d90530fa95bec5edc948fd8, which I reckon is the earliest commit that I can switch to, considering I'm running 4.8.17.

Also:

# zcat /proc/config.gz | grep -i OVERLAY
CONFIG_OF_OVERLAY=y
# lsmod | grep -P "udmabuf|dtbo|fpgacfg|zptty"
fpgacfg                20480  0
udmabuf                20480  0
dtbocfg                16384  4
configfs               36864  2 dtbocfg
zptty                  16384  0

# find /config
/config
/config/device-tree
/config/device-tree/overlays
/config/device-tree/overlays/fpgacfg0
/config/device-tree/overlays/fpgacfg0/dtbo
/config/device-tree/overlays/fpgacfg0/status

# find /sys/kernel/config/device-tree
/sys/kernel/config/device-tree
/sys/kernel/config/device-tree/overlays
/sys/kernel/config/device-tree/overlays/udmabuf4
/sys/kernel/config/device-tree/overlays/udmabuf4/dtbo
/sys/kernel/config/device-tree/overlays/udmabuf4/status
/sys/kernel/config/device-tree/overlays/fpgacfg0
/sys/kernel/config/device-tree/overlays/fpgacfg0/dtbo
/sys/kernel/config/device-tree/overlays/fpgacfg0/status

# echo 1 > /sys/kernel/config/device-tree/overlays/udmabuf4/status
[ 1359.453555] dtbocfg_overlay_item_create: failed to unflatten tree
ikwzm commented 7 years ago

Thanks for the issue. I'm not good at English. Please forgive me if it is difficult to read.

There is something I want you to tell me.

Did this happen with the linux kernel you built?  This problem was not reproduced in the Linux kernel provided by this repository (target/de0-nano-soc/boot/zImage-4.8.17-armv7-fpga) .

If you build a kernel, please execute the following command and tell me what configuration was done.

zcat /proc/config.gz | grep -i CONFIG_OF

By the way, in the kernel built with this repository, it becomes as follows.

zcat /proc/config.gz | grep -i CONFIG_OF
CONFIG_OF=y
# CONFIG_OF_UNITTEST is not set
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_ADDRESS=y
CONFIG_OF_ADDRESS_PCI=y
CONFIG_OF_IRQ=y
CONFIG_OF_NET=y
CONFIG_OF_MDIO=y
CONFIG_OF_PCI=y
CONFIG_OF_PCI_IRQ=y
CONFIG_OF_MTD=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
CONFIG_OF_OVERLAY=y
CONFIG_OF_GPIO=y

Perhaps there is something wrong with the configuration.

FPtje commented 7 years ago

Thanks for the reply!

Yes, it happens with a kernel I built. Sadly, I cannot use the Linux kernel from this repository.

The configuration is sadly the same, except for CONFIG_OF_IOMMU:

# zcat /proc/config.gz | grep -i CONFIG_OF
CONFIG_OF=y
# CONFIG_OF_UNITTEST is not set
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_ADDRESS=y
CONFIG_OF_ADDRESS_PCI=y
CONFIG_OF_IRQ=y
CONFIG_OF_NET=y
CONFIG_OF_MDIO=y
CONFIG_OF_PCI=y
CONFIG_OF_PCI_IRQ=y
CONFIG_OF_MTD=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
CONFIG_OF_OVERLAY=y
CONFIG_OF_GPIO=y
CONFIG_OF_IOMMU=y

Also, the code in the stack trace refers to CONFIG_SYSFS, which is also turned on.

# zcat /proc/config.gz | grep -i CONFIG_SYSFS
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_SYSFS_SYSCALL=y
CONFIG_SYSFS=y
ikwzm commented 7 years ago

uum...

I put the configuration of the linux kernel used in this repository in the following place.

If you compare with the configuration used there, you may find something.

Also, if there is a difference, it is a device tree. . . The device tree of de0-nano-soc provided in this repository is target/de0-nano-soc/boot/devicetree-4.8.17-socfpga.dts. Is there a difference from the device tree there?

FPtje commented 7 years ago

The dtb loaded by NixOS is from Linux-4.8.17, which I think is this one.

I downloaded devicetree-4.8.17-socfpga.dtb and told U-Boot to boot using that dtb. It boots fine, but the exact same kernel oops happens.

The kernel is being recompiled with some options found in your gist, most notably CONFIG_CONFIGFS_FS. Here's to hoping it'll work

FPtje commented 7 years ago

I've tried turning on pretty much all the options you have on, but it doesn't work. Still the same error. I still think it has something to do with the /config mountpoint. Somehow it's mounted at /sys/kernel/config, and /config is shows this in the mount command:

configfs on /sys/kernel/config type configfs (rw,relatime)
none on /config type configfs (rw,relatime)
ikwzm commented 7 years ago

Debian8-rootfs provided by this repository, fstab has the following settings.

debian8-rootfs# cat <<EOT > /etc/fstab
/dev/mmcblk0p1  /boot   auto        defaults    0   0
none        /config configfs    defaults    0   0
EOT
FPtje commented 7 years ago

Huh, that looks the same.

I've tried unmounting /sys/kernel/config, but that doesn't fix the error. I'm also currently compiling 4.10 to see if the added stuff for Altera works natively.

ikwzm commented 7 years ago

I understand you are very busy.

I want to see dts (device-tree-source) of altera-hps2fpga for debugging.

Can you replace altera-hps2fpga.rb there with this and try it out?

If you execute this script with --install, /var/log/altera-hps2fpga.dts is generated.

Please tell me the contents of /var/log/altera-hps2fpga.dts.

FPtje commented 7 years ago

Sure!

https://gist.github.com/FPtje/497ac37203507cc110f7a8e4540d2e1c

ikwzm commented 7 years ago

Thanks! Unfortunately, it was the same as my dts.

ikwzm commented 7 years ago

Please tell me about your .config.

# zcat /proc/config.gz > your_config
FPtje commented 7 years ago

https://gist.github.com/FPtje/338e59941fe49cbdb5d589f32e84167c

FPtje commented 7 years ago

Thanks a lot for the help, by the way!

ikwzm commented 7 years ago

my .config

CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set

your .config

# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set

The kernel internal error seems to be occurring in __kmalloc_track_caller() in mm/slub.c. This may be a clue to the solution.

ikwzm commented 7 years ago

Please replace the "altera-hps2fpga.rb" with the following one and try it.

https://gist.github.com/ikwzm/c274c8d13c32a13e78a94eec83b3b3d9

FPtje commented 7 years ago

Recompiling the kernel with CONFIG_SLAB=y has solved the kernel panic problem. Woohoo! Thanks for the help!

I still get the following error, though:

[   86.743488] fpga bridge driver
[   86.751289] altera_hps2fpga_bridge soc:fpgabridge@0: fpga bridge [hps2fpga] registered as device hps2fpga
[   86.751303] altera_hps2fpga_bridge soc:fpgabridge@0: init-val not specified
[   86.751938] altera_hps2fpga_bridge soc:fpgabridge@1: fpga bridge [lwhps2fpga] registered as device lwhps2fpga
[   86.751951] altera_hps2fpga_bridge soc:fpgabridge@1: init-val not specified
[   86.752533] altera_hps2fpga_bridge soc:fpgabridge@2: fpga bridge [fpga2hps] registered as device fpga2hps
[   86.752546] altera_hps2fpga_bridge soc:fpgabridge@2: init-val not specified

With some debugging I found that the error (info rather) is thrown in altera-hps2fpga.c, line 162 (not altera-fpga2sdram.c).

FPtje commented 7 years ago

Your replacement altera-hps2fpga.rb does the same. Also, I've updated to 0a372b4870b70591916cdc28e0a8cc649dcfb4ff

ikwzm commented 7 years ago

Oh, that was good.

I also built the kernel with CONFIG_SLUB=y, the kernel panicked like you.

Although the root cause is still unknown, apparently when the property names of the device tree overlay overlap, it seems that the allocation of memory fails and the kernel panicks.

Therefore, when deleting duplicate #address-cells = <0x1>; and #size-cells = <0x1>; in the new altera-hps2fpga.rb, the kernel no longer panicks.

FPtje commented 7 years ago

After some searches on Google I found that the init-val error must come from using a bad device tree. Somehow, I'm using a dtb where "init-val" is not set, while your Debian does have init-val set, somewhere. The strange thing is that target/de0-nano-soc/boot/devicetree-4.8.17-socfpga.dts has no mention of init-val, so it must be set somewhere else.

I'm still researching this issue. I found some documentation on init-val here

ikwzm commented 7 years ago

Since init-val is optional, it does not matter if you do not set it.

If you are worried, please do altera-hps2fpga.rb as follows.

https://gist.github.com/ikwzm/ffb190ad6824505fd8febf38063b28ed

This script sets init-val = <0>; in the device tree.

FPtje commented 7 years ago

You are awesome. It totally works. Thanks so much!