Closed bimonsubio1984 closed 5 years ago
As I already wrote hardware SMART ECC counter is frozen at a specific value:
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 119 119 033 Pre-fail Always - 2 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 089 089 000 Old_age Always - 2200 193 Load_Cycle_Count 0x0012 095 095 000 Old_age Always - 5xxxx 194 Temperature_Celsius 0x0002 125 125 000 Old_age Always - 48 (Min/Max 3/65) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 1241 223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0
As you can see the disk is in a relatively good condition according to SMART. But each zpool scrub indicates thousands of software CRCs. Earlier software CRCs always disappeared after a zpool scrub and then zpool clear, now it does not help. What does it mean? Does it mean the disk got some physical damages not indicated in the SMART? May be HDDs tracks servo marks have been damaged and the disk has a bad head positioning now?
Sure attackers read this message too, most likely it is interesting for them to know how reliable ZFS is against their attacks.
What I cannot understand is why another inactive during attack ZFS pool on the same physical disk does not indicate any CRC errors at all even after a full scrub? If the disk would be partially physically damaged would not it influence both pools? Especially taking into account both pools have some fragmentation. Does ZFS prevent by its algorithms a single HDD track to store information from different pools? Otherwise how it is possible to have a pool specific physical problem on the same single disk (simultaneously pure completely good and a relatively bad pool on the same disk) ?
But for the pool which was active during attack checksum errors are increasing in their count constantly during its usage and during scrub too. ZFS replication via send to Backup pool passes fine. I will compare rsync --dry-run soon, all files seem be consistent till now after a rollback in the day of the attack.
A copy of this question:
https://web.archive.org/web/20191027095649/https://github.com/zfsonlinux/zfs/issues/9518
just for a better information consistency.
The pool instance has finally died after reaching about 1 million of CRC errors, though only 4 files were affected among many files in the pool.
After replicating a new instance back from a backup there are no any CRC errors in the pool anymore, therefore it is a bug for sure which may be can be reproduced by a EMI attack on a pool.
09:54:05 workstation kernel: [ 591.595892] NET: Registered protocol family 38 09:54:05 workstation kernel: [ 591.645353] cryptd: max_cpu_qlen set to 1000 09:54:21 workstation kernel: [ 607.410846] zd96: p1 09:54:21 workstation kernel: [ 607.473011] zd112: p1 09:54:36 workstation kernel: [ 622.176676] PANIC: zfs: allocating allocated segment(offset=313493458944 size=131072) 09:54:36 workstation kernel: [ 622.176676] 09:54:36 workstation kernel: [ 622.176681] Showing stack for process 5930 09:54:36 workstation kernel: [ 622.176684] CPU: 0 PID: 5930 Comm: z_wr_iss Tainted: P OE 4.19.36-gnu #1.0 09:54:36 workstation kernel: [ 622.176685] Hardware name: Gigabyte Technology Co., Ltd. XXX, BIOS XX 09:54:36 workstation kernel: [ 622.176686] Call Trace: 09:54:36 workstation kernel: [ 622.176694] dump_stack+0x63/0x8a 09:54:36 workstation kernel: [ 622.176703] spl_dumpstack+0x42/0x50 [spl] 09:54:36 workstation kernel: [ 622.176708] vcmn_err+0x6a/0x100 [spl] 09:54:36 workstation kernel: [ 622.176712] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl] 09:54:36 workstation kernel: [ 622.176716] ? _cond_resched+0x19/0x30 09:54:36 workstation kernel: [ 622.176719] ? kmem_cache_alloc+0x16a/0x1d0 09:54:36 workstation kernel: [ 622.176723] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl] 09:54:36 workstation kernel: [ 622.176728] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl] 09:54:36 workstation kernel: [ 622.176797] zfs_panic_recover+0x69/0x90 [zfs] 09:54:36 workstation kernel: [ 622.176800] ? avl_find+0x5f/0xa0 [zavl] 09:54:36 workstation kernel: [ 622.176840] range_tree_add+0x19a/0x2f0 [zfs] 09:54:36 workstation kernel: [ 622.176876] ? dnode_rele+0x39/0x40 [zfs] 09:54:36 workstation kernel: [ 622.176917] space_map_load+0x3a3/0x500 [zfs] 09:54:36 workstation kernel: [ 622.176956] metaslab_load+0x37/0xf0 [zfs] 09:54:36 workstation kernel: [ 622.176995] metaslab_activate+0x90/0xb0 [zfs] 09:54:36 workstation kernel: [ 622.176996] ? _cond_resched+0x19/0x30 09:54:36 workstation kernel: [ 622.177035] metaslab_alloc+0x6f0/0x10d0 [zfs] 09:54:36 workstation kernel: [ 622.177075] zio_dva_allocate+0xaa/0x570 [zfs] 09:54:36 workstation kernel: [ 622.177080] ? tsd_hash_search.isra.3+0x46/0xa0 [spl] 09:54:36 workstation kernel: [ 622.177085] ? tsd_get_by_thread+0x2e/0x40 [spl] 09:54:36 workstation kernel: [ 622.177090] ? taskq_member+0x18/0x30 [spl] 09:54:36 workstation kernel: [ 622.177129] zio_execute+0x95/0xf0 [zfs] 09:54:36 workstation kernel: [ 622.177134] taskq_thread+0x2ae/0x4d0 [spl] 09:54:36 workstation kernel: [ 622.177136] ? __switch_to_asm+0x40/0x70 09:54:36 workstation kernel: [ 622.177139] ? wake_up_q+0x80/0x80 09:54:36 workstation kernel: [ 622.177178] ? zio_reexecute+0x390/0x390 [zfs] 09:54:36 workstation kernel: [ 622.177181] kthread+0x120/0x140 09:54:36 workstation kernel: [ 622.177186] ? taskq_thread_should_stop+0x70/0x70 [spl] 09:54:36 workstation kernel: [ 622.177187] ? kthread_bind+0x40/0x40 09:54:36 workstation kernel: [ 622.177189] ret_from_fork+0x35/0x40
I still have one another pool2 suffering the same CRC problems as died pool1.
pool2 was actively used for IO during being under attack as pool1 was too.
I can wipe out data at logical level of ZFS dataset and then make a binary copy of vdev, are you interested to study what type of bug it is?
my disks clicked many times and system freezed for 10-30 seconds several times
That's a drive dying, not external interference.
my disks clicked many times and system freezed for 10-30 seconds several times
That's a drive dying, not external interference.
You joke? Two drives failed at once during a one single day? And SMART hardware CRCs fixed by shorter SATA cables on both? After pool recreated on the same hardware disk ZFS software CRC is not detected anymore. It is drive, really?
Earlier they also tried to attack by AC power line many times, but I have a complex multilevel voltage protection.
Actually the attack was initially done by many means like Browser plugins generating very large Disk IO traffic up to 200-400Mb/sec, power surchages and a light radio EMI attacks. They tried many times this when I posted to different not desirable by them forums, the worst they could do though is to hang my disks and the therefore the whole system for about 10-30 seconds earlier.
But finally they realized it is not effective and when I began to collect materials for a law suit against their offline agents (related to other non computer topic) they tried to break my pools by a relatively intensive EMI attack.
FYI: Their attacks can be partially fixed by: 1) Shorter SATA and other cables. 2) Older hardware like Core2 Duo/Quad with IntelME deactivated preferably by Libreboot. 3) Isolation of trojaned by them proprietary software like Chrome, Skype, etc. into KVM cached by fast SSD. Full hardware isolation (additional mobos) is even better. Attacks via plugins in Ungoogled browser were present too, but less noticeable most likely because of less amount of plugins. 4) Libre Kernel free from BLOBs 5) Good distro like Devuan or Gentoo free from systemD, reproducible builds like in GUIX are very welcome. 6) Protected and encrypted boot sequence, completely open source like Libreboot + GRUB2 flashed into BIOS chip both + even boot disk encryption and GPG signature of files.
I already tried, it does not help. Please do not try to make us think of the incident as not real. There are hundreds of messages on the Internet about different ganstalkers' attacks.
SMART hardware CRCs fixed by shorter SATA cables on both?
sounds like bad cables.
hardware rots over time, you know! and it sounds like your equipment is quite old.
I tried two different computers including a relatively modern one, the same issue even worse on newer. These cables worked for years without any problems, it is definitely a EMI attack with some power and software attacks all simultaneous.
that is known as anecdotal evidence and holds no value.
Until there are many screenshots of moderators messages where they write even scare things to make someone silent. Please don't try to lie us. Thank you for understanding.
why would anyone put any resources into performing PowerHammer on your premises? I am not sure about PowerHammer, but they somehow generated significant voltage surges in the AC power line. My stabilizer and UPS AVR constantly clicked because of this. Voltage stabilizer has a voltage meter indicator also, voltage often jumped between 210-230V.
Even if this is happening (it isn't, please please please see a doctor), how are the maintainers of ZFS going to help? Your initial subject and opening comment already state everything that has nothing to do with ZFS. What could ZFS possibly do to help?
I would suggest you to visit a doctor instead of me as you did not notice a described problem with ZFS which cannot recover from CRCs after several scrubs and even pool finally died a week later after attack.
- use the same username in multiple places (if you are actually being targetted then you would NOT do this)
why not? I use several names, I do not hide yet, after switching to multilevel I2P, Freenet, etc. I will change nick names once again.
- describe part of your FDE setup, disclose you use an Intel CPU, the specific BIOS you use, disclosed you use Plex, disclosed the band of your motherboard, disclosed you don't use systemd AND EVEN YOUR KERNEL VERSION
What is bad about disclosing above info?
- and you disclose that you are preparing to sue and are gathering evidence towards "agents" in a GitHub issue
I indicated that EMI attackers tried to help offline person to avoid my law suit by destroying my disks when I was in hurry to not being late to send the docs via EMS. A claim was on the died pool with a daily backup copy of course.
You are almost certainly seeking for attention with made up scenarios, simply pretending, or extremely mentally ill and acting so irrationally that you believe agents are after you, and then disclose key things about you hardware nobody asked for in a public forum, the fact you're going to sue them, and how you mitigate fucking PowerHammer!!! Those two things do NOT add up.
It does not matter how much you write about them since they are capable of reading mind remotely, info in my reports is not of any value for them, they can read it directly from a head of a target.
Please, seek help or just stop pretending because you're actively harming people with genuine psychosis or other issues.
Please stop your lies and better read this: https://web.archive.org/web/20190624163342/https://www.rlighthouse.com/targeted-individuals.html
What is bad about disclosing above info?
Nobody here needs to know you use Plex, however I'm sure it's useful as it tells the attackers another piece of software you run. Nothing else you disclosed helps anybody here except the attackers. I guess I'll assume the benefit of the doubt as you didn't realize those are bad ideas.
Why would an attacker exfiltrate data using just your disk if your PC has an internet connection? Plex requires an internet connection to work properly...
Plex is a pool name, I am behind several firewalls including pfSense on my local router.
I would suggest you to visit a doctor instead of me as you did not notice a described problem with ZFS which cannot recover from CRCs after several scrubs and even pool finally died a week later after attack.
No, you didn't describe an issue with ZFS. Disks have a limited life span. You described a failing disk. Not a ZFS problem.
I have already recreated both pools from scratch on the same hardware disks, and no any new CRC errors on them, therefore you are wrong here about failing disk. It is actually a failing ZFS software algorithms which need to be fixed in the area of CRC errors correction.
It does not matter how much you write about them since they are capable of reading mind remotely, info in my reports is not of any value for them, they can read it directly from a head of a target.
Please provide evidence of this claim.
Sure it is hard to provide an evidence except my empirical observations of how they read thoughts in even a completely semiconductor free environment. Then they indicated thoughts as different coincidents. It is like a combination of thought reading and high capability of future prediction even for a target not using electronical devices.
See a doctor.
This phrase is generally said by agents and moderators trying to silence the truth on Internet forums. General their words are like: " take your meds and be silent". They are trying to expose such targets as silly idiots.
A mere ZFS user here, but this issue should be closed as it's not a ZFS issue at all.
Most likely I will upload a binary image of my vdev soon for further research by developers.
I have. I think the US government has better things to spend money on than people who believe they are worth targeting, but can never say why.
I would not argue it is exactly US government. It is some type of ZOG global psi op department most likely which is atop of all official governments. I am not in the USA btw. This problem exists in all countries over the whole planet including specifically Russia, Canada and USA too.
root@ceres:/download# zpool status -v pool: system_raw_copy state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub repaired 0B in 0 days 00:00:05 with 0 errors on Sun Nov 3 15:30:48 2019 remove: Removal of vdev 2 copied 88K in 0h0m, completed on Sun Nov 3 15:34:49 2019 720 memory used for removed device mappings config:
NAME STATE READ WRITE CKSUM
system_raw_copy DEGRADED 0 0 0
vdb DEGRADED 0 0 684 too many errors
loop2 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<0xffffffffffffffff>:<0x1>
root@ceres:/download# zpool remove system_raw_copy vdb cannot remove vdb: out of space
root@ceres:/download# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop2 7:2 0 1000M 0 loop
sr0 11:0 1 1024M 0 rom
vda 252:0 0 40G 0 disk /
vdb 252:16 0 300G 0 disk
I have installed zfs v0.8.2 in a KVM virtual machine using Devuan Ceres distribution:
root@ceres:/download# uname -a Linux ceres 5.3.8-gnu #1.0 SMP Tue Sep 27 12:35:59 EST 1983 x86_64 GNU/Linux
root@ceres:/download# dpkg -al | grep zfs ii libzfs2linux 0.8.2-2 amd64 OpenZFS filesystem library for Linux ii zfs-dkms 0.8.2-2 all OpenZFS filesystem kernel modules for Linux ii zfs-initramfs 0.8.2-2 all OpenZFS root filesystem capabilities for Linux - initramfs ii zfs-zed 0.8.2-2 amd64 OpenZFS Event Daemon ii zfsutils-linux 0.8.2-2 amd64 command-line tools to manage OpenZFS filesystems
Passed through vdev of the problematic pool into this VM as /dev/vdb. Added a new loop2 device, tried to remove vdb, but it reports not enough space.
Please suggest how to remove vdb from the pool.
Host device status holding virtual vdb:
20:44 root@workstation / > zpool status -v WD1 pool: WD1 state: ONLINE scan: none requested config:
NAME STATE READ WRITE CKSUM
WD1 ONLINE 0 0 0
WD1 ONLINE 0 0 0
errors: No known data errors
20:44 root@workstation / > zfs list WD1 -r NAME USED AVAIL REFER MOUNTPOINT WD1 238G 8.40G 96K none WD1/raw_copy_vol 238G 8.40G 237G -
As you can see a dd copy of a problematic vdev has transferred the CRC problem to a new error free device (virtual zvol on a new physical device not used on this computer yet earlier) .
dd if=/dev/sdX_with_CRC_errors_in_zpool_status | pv | dd of=/dev/zvol/TestPool/raw_copy_vol
And after importing pool "system" we still see CRC errors atop of vdev TestPool/raw_copy_vol
though below TestPool/raw_copy_vol in its status there are no CRC errors reported.
underlying zpool status TestPool is clear of any CRCs and zpool status system still has CRCs even after a scrub, though its vdev is atop of clear pool zvol.
Looks like a ZFS bug, does not it?
How can I wipe out my sensitive information from vdev with CRCs to upload it to you?
I know I can dd if=/dev/zero of=/dev/mapper/luks_vol where luks_vol is atop of the problematic pool zvol like system/dummy_wiping_vol
It would fill the pool system and underlying vdev TestPool/raw_copy_vol with a random data, most likely it will even inflate CRC counters a lot and even can kill the pool again.
How can I then zero the vdev to avoid uploading 256GB of random data?
Do you know an easier and faster method to wipe a vdev?
what exactly do you expect ZFS to do when you have no redundancy?
It worked fine for me for years before the attack.
where do you want it to salvage the lost data from?
I cannot understand at all what these CRC errors mean since rsync compare passes fine. These CRCs do not produce any broken files, how is it possible?
if you have CRC errors in non-redundant pool, that's on you. that's your fault.
I have a redundant backup for replication. There are 99% of desktop users over the world who do not have redundant disks on their workstations and why I should have them? Do all of them have to visit a doctor? They often miss even backups, btw. not near at all to automatic daily replication as for me.
and I have to agree with others who suggest you see a doctor. but oh no! I must be an agent.
Sure you are an agent, since you associate your words with opinions of "OTHERS", you are trying to tell like from other persons, though actually you tell rubbish and false related to my attitude since very beginning of this thread/issue. And that is because you are intricate and lying as a fox in terms of trying to hide the truth about ganstalkers.
Shouldn't this be taken to the mailing list?
non-redundant pools can not self repair when you scrub them
As you could notice already these checksum errors almost do not influence on files except a few one reported as unrecoverable in zpool status -v.
Earlier sometimes (may be once per 3-6 months) I have seen a few of checksums per the whole pool like a few 1-5 errors. And they were always fixed by a scrub on a non redundant pool. I do not know what does it mean and why it happens so.
Also I have noticed a bad file from status -v on a single mirror could be fixed when its second Backup mirror was onlined. It is what you are talking about most likely and I have seen this too.
because those errors might refer to metadata and not just files. and there are multiple copies of metadata stored by default even on a singleton vdev, it can sometimes repair the errors. but it can not repair any data errors.
Can you please explain why I got mostly metadata errors? How do they matter if the errors bypass continuously to the the level of zpool from vdev level? Shall I see some broken catalogs? Missing files, etc.? Do such damages propagate to older snapshots those were done before attack?
zfs diff did not indicate any significant changes between snapshots, and should it in a case of metadata corruption?
you keep saying "do not influence" and "worked fine" but the pool in question stopped importing entirely after one week. bad hardware.
The pool was created in 2012, worked fine till October 22, 2019 - you tell it is a bad hardware. On the same day I have replaced long cables with short SATA 10CM. CRCs were still inflating (most likely metadata). But disks did not hang anymore and unrecoverable files did not appear anymore in the zpool status -v output. And the hardware became good again since pool recreation on October 29, 2019 - very smart, is not it?
Maybe a bad controller?
Maybe a bad controller?
I have tried problematic (in terms of CRCs) pool with different SATA controllers.
I am sure if you download a binary copy of my vdev raw_copy_vol, dd it to your any good hardware and then import the pool system_raw_copy, you will see the same CRC errors as me.
@bimonsubio1984 It sounds like your hardware has failed and ZFS has correctly reported the resulting checksum errors. If you encounter a bug in ZFS, please open a new issue. If you have questions about how ZFS works, the mailing list may be a better place to find answers. (https://zfsonlinux.topicbox.com/groups/zfs-discuss)
yeah, a dd copy of a failed device will contain checksum errors because the failing device was unable to provide good copy of the data, and you had no redundancy so it couldn't serve good copy from remaining disks. please close the issue.
How do you identify a device as a failed one? Temporary increase of SMART CRC counter until shorter SATA cables connected?
Then why on a new GOOD device on ANOTHER computer even after dd of the pool, CRC errors are still highly and constantly inflating (increasing) during pool usage and/or its scrub? Is it software or hardware related? While I can agree that dd copies metadata errors in the pool, but why amount of CRC errors still continues to increase even after a full scrub passed after dd?
Each scrub added many new CRC errors already on a good hardware, how is it possible? Is it expected behavior of ZOL?
If on the same hardware a new pool is created from a scratch no even a singe CRC error detected and no inflation of them.
How can you explain CRC inflation on the below screenshot?
On the left side it is a KVM virtual machine with ZFS pool inflating CRCs.
On the right side it is a host with a pure good pool, hardware good? hah
For a full size screenshot please use following link (need to click on the picture for a full size): https://cdn-14.anonfile.com/z5Bce1A7n5/68ccbfad-1573102686/Zpool.jpg
And I have allowed the pool to pass scrub in full several times already. Each pass added new CRC errors. On the died pool it was more expressive in amounts of CRC errors, over 1 million CRC errors inflated before pool died on already a good device (after SATA cables shortened).
cat /etc/devuan_version ascii
It has all the latest Devuan Ascii packages after apt update and dist-upgrade
Recently after a few days after partially disabling IntelME from my Devuan Linux just by removing its kernel modules I have experienced a radio emission attack on my disks.
One of disk had two ZFS pools. One active pool and another inactive idle though imported pool.
During attack my disks clicked many times and system freezed for 10-30 seconds several times. SMART indicated a constantly increasing CRC rate. And finally it hanged completely about twice so that I had to reboot each time. Also a significant headache was noticed this day most likely from EMI.
I had to replace SATA cable with a shorter ones like only 10CM long and it stopped the attack. Returning long cable later though did not reveal attack is still present, most likely it ended just after they realized a lack of its effect. CRC counter stopped at the current position in HDDs SMART.
Some files were destroyed and marked by ZFS as bad IO if tried to read them, primarily browsers's profiles which most likely became just a reason of the attack during posting on some Internet forum undesirable by Ganstalkers. Rollback to a recent snapshot fixed the files of course.
Several full scrubs resulted in following zpool status:
errors: Permanent errors have been detected in the following files: