veracrypt / VeraCrypt

Disk encryption with strong security based on TrueCrypt
https://www.veracrypt.fr
Other
6.75k stars 937 forks source link

[Linux] Veracrypt freeze at the end of big volume creation #474

Closed Schweineschwarte closed 4 years ago

Schweineschwarte commented 5 years ago

Hello, if I create a "small" volume with Veracrypt 1.23 for Linux 64 bit, it works without problems. Now, I have a new external Seagate HDD with 2 TB and I test it with SeaChest without any errors. If I want to encrypt this HDD as a partition/drive, the Veracrypt GUI freeze at the end of the volume creation (after the bar reached 100% - see image). Same problem with the console version of Veracrypt 1.23 (I tested it over night - time enough to soothe). I can create a normal partition with fdisk and can create a filesystem with mkfs.ext4 without errors. After, I create a 1,6 TB Container in this partition, but the Veracrypt GUI freeze at the end again. I am not sure, if Veracrypt freeze complete or only the GUI/console output message. If I unplug the external HDD, some kworkers need much CPU performance. I find no trouble reports in dmesg, so I think Veracrypt have some trouble with big volumes.

My system: openSUSE 15.0 64 bit with KDE Hardware: https://pastebin.com/3G47NSzm

Freeze image:

Screenshot_20190722_203049

Schweineschwarte commented 5 years ago

I have test it with a new computer, with AMD Ryzen 5 2600 processor and openSUSE 15.0 64 bit with KDE. While the encryption I log the temperature with "sensors" and the CPU load with "ps". Short before the freeze,the speaker beep and the Konsole calls the message Message from syslogd@linux-9ilm at Aug 25 15:36:42 ... kernel: 15264.337904] NMI watchdog: BUG soft lockup - CPU#2 stuck for 23s! [ksoftirqd/2:22] This happen 4 times. At this times the computer stuck and lags hard. After, the computer don't response. (see the image below)

At beginning the temperature have the following values:

Temperatur acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C (crit = +20.8°C)

amdgpu-pci-0900 Adapter: PCI adapter fan1: 995 RPM temp1: +47.0°C (crit = +0.0°C, hyst = +0.0°C)

k10temp-pci-00c3 Adapter: PCI adapter Tdie: +43.0°C (high = +70.0°C) Tctl: +43.0°C`

At the end, the temperatures are: `Temperatur acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C (crit = +20.8°C)

amdgpu-pci-0900 Adapter: PCI adapter fan1: 1001 RPM temp1: +43.0°C (crit = +0.0°C, hyst = +0.0°C)

k10temp-pci-00c3 Adapter: PCI adapter Tdie: +60.1°C (high = +70.0°C) Tctl: +60.1°C

I think, this isn't too high. AMD says the Max Temps is 95°C. https://www.amd.com/de/products/cpu/amd-ryzen-5-2600

The CPU load at beginning:

%CPU %MEM ARGS So 25. Aug 13:36:36 CEST 2019 0.5 0.0 [kswapd0]
0.7 0.0 [dmcrypt_write] 1.2 1.4 /usr/bin/plasmashell 1.3 0.0 [kworker/6:1] 1.6 0.7 /usr/bin/kwin_x11 1.8 0.0 [wlan0] 2.5 0.5 /usr/bin/X 2.8 0.0 [ksoftirqd/2] 56.6 0.0 [kworker/u64:1] 72.7 0.2 /usr/bin/veracrypt

The CPU load at the end:

%CPU %MEM ARGS So 25. Aug 15:42:37 CEST 2019 1.3 1.4 /usr/bin/plasmashell 1.5 0.0 [ksoftirqd/7] 1.6 0.8 /usr/bin/kwin_x11 2.6 0.5 /usr/bin/X 3.0 0.0 [dmcrypt_write] 4.0 0.0 [ksoftirqd/2] 4.0 0.0 [kworker/6:2] 11.0 0.0 [wlan0] 26.1 0.2 /usr/bin/veracrypt 84.3 0.0 [kworker/u64:1]

The load of veracrypt and kworker have been interchanged. At beginning VC 72.7, kworker 56.6 and at the end VC 26.1 and kworker 84.3.

The new system: openSUSE 15.0 64 bit with KDE Hardware: https://pastebin.com/KvsqqnjQ

Veracrypt_Watchdog_klein

Schweineschwarte commented 5 years ago

Both times I want to encrypt with AES(Twofish) and SHA-512. The Benchmark tell some huge higher speed as Veracrypt use for encryption. The encryption speed is only at the beginning high, but break down very quickly.

Benchmark: Screenshot_20190825_163832

alt3r-3go commented 5 years ago

@Schweineschwarte, thanks for providing details of the problem you observe and doing additional exploration. That low performance bug is unlikely to be related to this one you're observing, at the face of it anyway. The one there doesn't cause any stalls, it's just a benchmark producing unexpectedly low result.

So let's look into this one here in more detail. Those soft lockup messages should also be accompanied by stack traces - could you please post either your syslog excerpts for those (full stack traces together with soft lockup messages) or [preferred, at it will provide better picture] full log output starting from the machine boot, then with VeraCrypt starting and doing the operation that gets stuck for you and ends in a soft lockup. That should provide additional information for troubleshooting.

The temperature doesn't look like a problem in this one - the values are in the "okay-ish" zone and overheating wouldn't cause soft lockups anyway, that must be a purely SW-level problem. The temperature increase per se it also expected - your CPU is doing additional work of encryption after all.

Schweineschwarte commented 5 years ago

@alt3r-3go Here some log files. The external HDD, who should be encrypt, is sdc. This run I break up at the end, because the computer was very very slow, but don't crash. The mouse didn't work but I could enable and disable the Num-Lock light. So I don't think the computer crashed, at this moment. But it was impossible to work with this machine at this moment. The volume creation speed breaked down to 11 MB/s. So I think the computer have reached the status which we want to observe. It comes no watchdog message at this end. But I see some errors messages in the log file who could be interest you.

dmesg before volume creation starts: https://pastebin.com/na2PiaDW

journalctl bevore volume creation starts: https://pastebin.com/dgnMJgT1

dmesg log active on volume creation (starts with equal values): https://pastebin.com/CkjYhsHV

journalctl log active on volume creation: https://pastebin.com/MUj4adPE

/var/log/warn: https://pastebin.com/z1QRFx7x

/var/log/messages (too big for pastebin): https://gist.github.com/Schweineschwarte/96c463d67ab4d7b2ff5d1ee690a059e7

Schweineschwarte commented 5 years ago

Here, you can see the /var/log/warn of 25th August 2019, with the soft lockup messages. https://gist.github.com/Schweineschwarte/f1a6a1ff385fd3cd77b478d968eba3cd

alt3r-3go commented 5 years ago

Thanks, that helps a lot. I don't have time to look in all the details this week, but what I can see at the first scan of the dmesg and the warn log - this actually doesn't look like VeraCrypt driver at all, but reminds me of a bad sector (or a set thereof) on the disk drive.

The USB and SCSI drivers scream errors when writing and they are both "below" VeraCrypt driver. Plus, soft lockup looks like a natural consequence in this case, because the drives, especially "spinning rust"-type as you seem to have here, tend to stall the I/O operation trying to read (write) the sector again and again, instead of just returning an error. That in turn leads to the driver getting stuck in the IO wait and then it gets noticed by the scheduler eventually, manifested as a soft lockup error. And Linux used to be rather allergic to prolonged IO waits (in my experience, anyway, and that's from a while ago), so general OS stalls and all sorts of glitches are expected.

So please run a full bad sector check on your drive - there's usually a vendor utility for that, Linux also has some, Windows disk tools also can do that - but it would be best to do it on a physical host, not the virtual machine as you seem to have here (there are Virtual Box drivers trying to load themselves anyway, so this is a guess) to prevent any additional complications from the VMM middleman.

Schweineschwarte commented 5 years ago

I have tested the external Seagate HDD again (with my Linux "host system", not in a virtual machine) but SeaChest can't find any errors.

Available devices: https://pastebin.com/k3eeMnai

Device information: https://pastebin.com/xVjscpfH

SMART check (unsupported): https://pastebin.com/TESafZTN

SMART error log (unsupported) https://pastebin.com/P0ETyJcS

Long generic test: https://pastebin.com/dmhP0Qss

I have saved the full log of the long generic test, but the log file have a size of 1,9 GiB (a bit to much for pastebin :smile: ). If you want to see this file I can upload it on an file hoster. But you will see only "Reading LBA: 0" until "Reading LBA: 3907029120".

alt3r-3go commented 5 years ago

Thanks. That's interesting then. One other reason, though IMHO much less likely, is power brownout during more intensive operations, but that would be harder to test. Is this drive powered directly from USB or has a separate power adapter?

git70 commented 4 years ago

I thought it might be related to your problems: https://github.com/keepassxreboot/keepassxc/issues/3569 https://github.com/keepassxreboot/keepassxc/issues/3415 Common features: AMD Ryzen + OpenSUSE Maybe it's worth checking ...

Schweineschwarte commented 4 years ago

@alt3r-3go Thank you for your efforts :) Yes, this device is directly connected to my front USB, without any adapters. I observed the HDD connection is lost if I check the HDD with the program "badblocks" or I want to create a sha512sum over a very big file (2TB). I have done some more tests, but I am not finished yet and I have no time to do this in the next two weeks. If I am ready I will post it here. ;)

@git70 I will look, if it can help. Thanks!

alt3r-3go commented 4 years ago

Thanks and sure, take your time. This indeed sounds like insufficient power (errors or malfunction under load), something that happens frequently with those external drives that are powered only from the USB, despite the manufacturer's advertising. I, for one, always buy those with additional external power adapter, because of that - not that it's convenient, oh well.

Schweineschwarte commented 4 years ago

After a long time I want to report me back. I did some more things and I think I had multiple problems. First, my front USB ports are not very stable. I had recognize some problems with my WiFi-stick, if I copy huge amounts of files, with an other hard disk (which had external power supply). The other hard disk with external power supply works fine, but the WiFi-stick had some connection trouble at this time. So, I connected the problem HDD on the backside, but the problem HDD had some trouble, too (encryption didn‘t work, breakup on huge file checksums etc.). Then, I wanted to check, if the origin of this problems is the HDD-controller or the case controller. I removed the HDD of the case and I buyed an USB-Y-cable (1x power, 1x data) to SATA-connection (DeLOCK Konverter SATA-22-Pin zu USB-3.0-/2.0, Adapter) and connected the extracted HDD to my USB-ports on the backside. I was surprised as I saw in SeaChest, the HDD have “now“ SMART-support… Now, I can encrypt my HDD and can check huge files with checksums etc. So I think, the second problem was the case controller (maybe, the low power via one usb-port could be a third reason). Thank you very much for your help!

alt3r-3go commented 4 years ago

No worries, glad you've got it working now and thanks for reporting back, that's going to help other people in similar situations.