kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
530 stars 240 forks source link

Illegal instruction: Upgrading from btrfs-progs-6.3.3 to btrfs-progs-6.5 broke "send/recieve" and "check" functionality #676

Closed vvxvv closed 10 months ago

vvxvv commented 10 months ago

btrfs filesystem show /dev/sda1

Illegal instruction

btrfs check /dev/sda1

Opening filesystem to check... Illegal instruction

Journal:

Sep 10 13:20:24 pc kernel: traps: btrfs[1491] trap invalid opcode ip:55a7d4989eb0 sp:7ffd40934f60 error:0 in btrfs[55a7d48b5000+d6000] Sep 10 13:20:24 pc systemd[1]: Created slice Slice /system/systemd-coredump. Sep 10 13:20:24 pc systemd[1]: Started Process Core Dump (PID 1492/UID 0). Sep 10 13:20:24 pc systemd-coredump[1493]: Resource limits disable core dumping for process 1491 (btrfs). Sep 10 13:20:24 pc systemd-coredump[1493]: [🡕] Process 1491 (btrfs) of user 0 dumped core. Sep 10 13:20:24 pc sudo[1489]: pam_unix(sudo:session): session closed for user root Sep 10 13:20:24 pc systemd[1]: systemd-coredump@0-1492-0.service: Deactivated successfully.

uname -a

Linux pc 6.1.52-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 07 Sep 2023 05:17:41 +0000 x86_64 GNU/Linux Arch Linux

PS: Downgrading back to the 6.3.3 solves the problem

kdave commented 10 months ago

This must be because of the updated accelerated crc32c but it's using the same CPU instructions so there should not be any change. What CPU family/model is it?

vvxvv commented 10 months ago

My CPU is Intel Core i7-860

Core dump is here

Thanks

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 36 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz CPU family: 6 Model: 30 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 5 BogoMIPS: 8029.78 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ep t vpid dtherm flush_l1d

kdave commented 10 months ago

Thanks, that's some old hardware but the SSE 4.2 support is there. Please try to build the hash-speedtest and run it:

make hash-speedtest
./hash-speedtest

It'll probably crash at some point. The core dump does not show enough details, I think the problem is in the crc32q instruction but that should be available in the same SSE version. Is the Core i7-860 different in this regard?

vvxvv commented 10 months ago

└─⮞ ./hash-speedtest

CPU flags: 0x1f CPU features: SSE2 SSSE3 SSE41 SSE42 Block size: 4096 Iterations: 100000 Implementation: builtin Units: CPU cycles

NULL-NOP: cycles: 43087904, cycles/i 430 NULL-MEMCPY: cycles: 82190620, cycles/i 821, 19042.311 MiB/s CRC32C-ref: cycles: 3740497124, cycles/i 37404, 418.434 MiB/s CRC32C-NI: Illegal instruction (core dumped)

loqs commented 10 months ago
Core was generated by `btrfs receive -q /run/media/sdb2/backup/2023-09-11--16:53:14'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x000055a6c4dd8eb0 in crc_pcl ()
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ep

Does not contain pclmulqdq.

There is no check for pclmulqdq in cpu_detect_flags() is that intentional? https://github.com/kdave/btrfs-progs/blob/f7ecc34555b4793573c9e3fc5f77cc8aab63fcc1/common/cpu-utils.c#L67-L92

kdave commented 10 months ago

Does not contain pclmulqdq.

Right, thanks. That's it and needs to be tested separately from the SSE flags.

vvxvv commented 10 months ago

loqs, kdave thanks

kdave commented 10 months ago

Can you please test if v6.5.x fixes the problem on your side? I have verified build but don't have a proper 32bit environment without the PCLMUL support.

vvxvv commented 10 months ago

Kdave, Yes, the problem is fixed! After installing the package kindly provided by loqs here

kdave commented 10 months ago

Thanks.