raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.14k stars 4.99k forks source link

kernel BUG at arch/arm64/kernel/fpsimd.c:282! AND other issues ---- Pi 3B+ #2575

Closed ShapeShifter499 closed 6 years ago

ShapeShifter499 commented 6 years ago

I'm making use of a Aarch64 compiled version of this kernel through https://github.com/sakaki-/bcmrpi3-kernel with Arch Linux ARM. I've been seeing some strange messages in my dmesg output and I'm worried it could lead to data corruption. Does anyone know what it is and how it could be fixed if it's serious?

These messages appear when there is high I/O. Such as when I'm attempting to move data from one USB hard drive to another.

All of my devices are encrypted including the rootfs. I'm able to remotely decrypt it using a initram hook and dropbear ssh. My rootfs is stored on a PNY 128GB SD card class 10. This is run on a Raspberry Pi 3B+

Following snippets of logs where the messages show. https://gist.github.com/ShapeShifter499/df184265fbf0672140d396d89510d184 https://gist.github.com/ShapeShifter499/3af51b6c3d87c07e300110f81e162df8 https://gist.github.com/ShapeShifter499/8c43fca9b6404126c6971564d695be4b

This issue appears related possibly to the following issues: https://github.com/raspberrypi/linux/issues/2482, https://github.com/raspberrypi/linux/issues/2557, https://github.com/raspberrypi/linux/issues/2564, https://github.com/raspberrypi/linux/issues/2555, and https://github.com/raspberrypi/linux/issues/2551 But I'm not enough of an expert to be 100% sure.

JamesH65 commented 6 years ago

Generally we do not support 64 bit kernels, especially ones that are not even ours. That said, a quick look at the logs indicates that the errors are something to do with the encrypted nature of the devices. None of that stuff would be different from upstream.

Does the same issue happen with the standard 32bit Pi supplied kernel? If so we could spend a bit more time on it, but its an unusual use case so won't be top of the list.

ShapeShifter499 commented 6 years ago

@JamesH65 I should be fine to download a copy of Raspbian to test this issue with? Is there anything extra I should do when testing with the 32 bit kernel or where the logs above sufficient? I'll do some tests on that OS and report back later.

ShapeShifter499 commented 6 years ago

@JamesH65 I'm probably not the best guy to try to be debugging this issue because of my lack of experience. I threw a recent copy of Raspbian on a 16gb class 10 Kingston but I didn't encrypt the SD card. (2018-04-18-raspbian-stretch-lite) I mounted the encrypted hard drives the same way I did as on the 64bit kernel. Now I'm not seeing any apparent crashes or errors in log. rsync is working fine between to encrypted hard drives. Could this be a specific issue with 64bit and dm_crypt? Maybe it's some I/O issue with a full encrypted SD card?

Does anyone have any ideas how I should handle this situation?

The only reason that I wanted Aarch64 was due to issues with 32-bit PHP that I didn't know how to work around. I plan on working with all sorts of file sizes that may go beyond 4GB and this won't work with Nextcloud and 32-bit PHP from what I know. If there was a way to make this work with 32-bit PHP I wouldn't need to use a Aarch64 kernel.

JamesH65 commented 6 years ago

So only fails when the SD card is encrypted. Hmm. Dunno. Can you work with the system like that?

I cannot comment on the problems with the 32bit versions of the apps you have mentioned - not my field!

oniongarlic commented 6 years ago

Looks like some issue with using neon in 64-bit kernel. ... arch/arm64/kernel/fpsimd.c, 282 BUG_ON(!may_use_simd()); ... Try disabling NEON accelerated AES, CRYPTO_AES_ARM64_BS (and ? CRYPTO_AES_ARM64_NEON_BLK) (or even all of ARM64_CRYPTO) as a workaround.

ShapeShifter499 commented 6 years ago

@oniongarlic how is that going to effect the speed of file transfers when everything is encrypted?

EDIT: or is it? I read conflicting reports online on whether or not the Raspberry Pi 3B+ CPU even had ARM64 supported crypto extensions for speeding up encryption.

JamesH65 commented 6 years ago

@ShapeShifter499 I suspect that the speed of transfers is likely to be bound more by the network/USB speed that the encryption speed, but that is really a guess. Would have to be worked out.

I have no idea if the Armv8 on the SoC has 64bit crypto extensions.

pelwell commented 6 years ago

bcmrpi3_defconfig in arm64 sets CONFIG_ARM64_CRYPTO=y, so one would hope so.

asavah commented 6 years ago

@JamesH65 @pelwell

It is really sad that RPF engineers do not have any info about ARMv8 crypto extensions on the SoC they sell... And that this is not documented ....

No, cortex-a53 on the BCM283X SoCs does not have optional crypto extensions.

Output from cat /proc/cpuinfo of rpi3 in aarch64 mode

Features        : fp asimd evtstrm crc32 cpuid

Output from allwinner H5 SoC (cortex-a53) running in aarch64 mode

Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

aes pmull sha1 sha2 being the ARMv8 crypto extensions.

JamesH65 commented 6 years ago

Not sad, just not in our (or most of us?) realm of expertise - you cannot know everything.

Of course, now we do know. All I have to do now is try and remember the fact.

ShapeShifter499 commented 6 years ago

@JamesH65 @pelwell @asavah @oniongarlic does this mean that some of the crypto modules, extensions, and code in the kernel build should not even be enabled for the Raspberry Pi 3, B. and 3B+ ?

If so, can this be reflected in the defconfig please?

ShapeShifter499 commented 6 years ago

@oniongarlic I'm building a kernel right now with the following changed from the defconfig here

CONFIG_CRYPTO_AES=y CONFIG_ARM64_CRYPTO=n CONFIG_CRYPTO_AES_ARM64=n CONFIG_CRYPTO_AES_ARM64_NEON_BLK=n CONFIG_CRYPTO_AES_ARM64_BS=n

ED6E0F17 commented 6 years ago

This appears to be a known problem upstream. It was assumed that "may_use_simd()" would always be true, but that was changed to "!in_interrupt()" and then "!in_irq() && !irqs_disabled() && !in_nmi() && !raw_cpu_read(kernel_neon_busy)".... which I think is consistent with only seeing the BUG() during high I/O, and means that you cannot use aarch64 NEON acceleration for disk encryption.

https://github.com/raspberrypi/linux/blob/rpi-4.14.y/arch/arm64/include/asm/simd.h#L29

I assume that most ARMv8 systems will be using Crypto instructions instead of NEON, so this is a low priority upstream.

ED6E0F17 commented 6 years ago

@ShapeShifter499 - you are going to need CONFIG_CRYPTO_AES, CONFIG_ARM64_CRYPTO, and CONFIG_CRYPTO_AES_ARM64; it is only the NEON instructions that are broken:

CONFIG_CRYPTO_AES_ARM64_NEON_BLK=n
CONFIG_CRYPTO_AES_ARM64_BS=n

There will be a large performance hit without those options, but it is necessary to remove them on 4.14

Upstream are making good progress with 4.17/4.18, but there are still patches pending before it would be safe to test the NEON accelerations on a mainline kernel.

JamesH65 commented 6 years ago

@ShapeShifter499 Have you had a chance to try out any of the suggestions so far?

ShapeShifter499 commented 6 years ago

I recently gotten a Rock64 board with 4GB of RAM in the mail from China. From what I could tell the CPU of that board fully supports all the ARM Neon Crypto stuff that seemed to be crashing with the Raspberry Pi. I have since then fully moved my setup to that board and I no longer see any crashes. It has just the right amount of RAM and speed plus the USB I/O is separate from the Ethernet. All in all it's pretty good for my use case.

My setup on my Rock64 is as follows: Arch Linux ARM LAMP (Linux, Apache, MariaDB, PHP) Nextcloud Syncthing ZNC

I no longer see myself using my Raspberry Pi 3B+ for any intensive file management or backup any longer. I don't know if I should close this since this probably affects some people who wanted to use their Raspberry Pis as a NAS or for other server like services.

JamesH65 commented 6 years ago

I think this is probably all going to be fixed in a later kernel (4.14.18 or so). So I'll close it for the moment, if necessary it can be reopened at a later date.

asavah commented 6 years ago

@ShapeShifter499 rock64 uses a pretty old 4.4 kernel with a lot of hacks. I have that board too ... It's possible that it's not affected because the kernel is old. On a side note: try building rock64 kernel with bpf syscall enabled, boot the board, enjoy the fireworks.

ShapeShifter499 commented 6 years ago

@asavah I'm currently on Arch Linux ARM which has a far more up to date kernel build. I've installed the "linux-aarch64-rc" package which puts me at 4.18. There is currently an issue with USB 3.0 so everything I have set up is off the USB 2.0 ports. Other than that things work well so far.

Linux kumo 4.18.0-rc2-1-ARCH #1 SMP Mon Jun 25 18:38:37 MDT 2018 aarch64 GNU/Linux

Since I'm not that familiar with the Linux kernel what happens with "bpf syscall" enabled?

JamesH65 commented 6 years ago

Just a reminder - this is a Raspberry Pi github issue tracker, not Rock64.....I'm sure they have their own.

ShapeShifter499 commented 6 years ago

@JamesH65 Sorry, one last thing I want to say.

@asavah CONFIG_BPF_SYSCALL is enabled for the Arch Linux ARM kernel I'm running and I see no issues that I know of here.

ED6E0F17 commented 6 years ago

It should be possible to hit this bug on Arm v7 if you enable CRYPTO_AES_ARM_BS in the Pi3 config. (I don`t think this option is enabled in Raspbian but it would be 25% faster). It should be easy to test by setting up encrypted RAM disks until you run out of free memory.

The upstream fixes for Arm v8 Neon have not even been applied to Arm v7 Neon, so I do not know how long it will take to backport them to 4.14 LTS. It would be best to revert https://github.com/raspberrypi/linux/commit/a2092141807514666a27397 for 4.14

ShapeShifter499 commented 6 years ago

@ED6E0F17 Possible or impossible?

It should be possible to hit this bug on Arm v7 if you enable CRYPTO_AES_ARM_BS in the Pi3 config.

ED6E0F17 commented 6 years ago

Possible or impossible?

Both? Arm v7 has the code that could trigger sleeping in the non-preemptible Neon section, but it did not get the extra code that would detect failure.