Closed Jamessall closed 9 years ago
Hi Jim, thank you for tracing this problem.
I've pushed a possible fix to the branch ata_fallback_usb
(f367dce7402e073cf78808f633fe17695892c8d8). If you get a chance, could you give it a try? The new code tries to probe the disk with stat_ata
, and falls back to stat_usb
if it fails.
Thanks!
Thanks, Jude, but it doesn't work. The reason is that stat_ata on the usb drive takes a very long time to return it's error status...so long that disk.sh gets invoked for every partition before the first invocation finishes. Maybe need to run the daemonlet synchronously (ugh). I'll have a look at stat_ata to see if I can suss out the reason.
Another curious thing: I notice by going through this that disk.sh gets executed twice for the drive (sdb) and twice again for each of the partitions. Bug?
Jim
so long that disk.sh gets invoked for every partition before the first invocation finishes.
That's the expected behavior. With the default configuration, disk.sh
handles any device whose DEVTYPE is disk
or partition
.
Another curious thing: I notice by going through this that disk.sh gets executed twice for the drive (sdb) and twice again for each of the partitions. Bug?
Not sure. Can you provide a listing from the logfile, with vdevd running with debug mode enabled? To do so, either set loglevel=debug
in /etc/vdev/vdevd.conf, or pass -v2
on the command-line. Disk devices usually have multiple entries in /sys, but I haven't yet seen a case where a single disk or partition generates multiple uevents with DEVTYPE=disk
.
Sorry, I didn't explain that very well. What I meant is that the parent metadata has not yet been created when disk.sh is run for the partitions, resulting in
[helpers] ERROR: partition not found: no metadata on parent device of /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/host17/target17:0:0/17:0:0:0/block/sdb/sdb4 (at /dev/metadata//sysfs/\x2fsys\x2f\x2fdevices\x2fpci0000:00\x2f0000:00:1d.0\x2fusb2\x2f2-1\x2f2-1.3\x2f2-1.3:1.0\x2fhost17\x2ftarget17:0:0\x2f17:0:0:0\x2fblock\x2fsdb)
Anyway here's the log, snipped to just before plugging the drive. Also some of my own debug messages in there vdev.log.txt
And to confirm that it is a race condition, I ran it with daemonlet=false for disk.sh. It works: vdevd.log.nodaemonlet.txt
Jim
Thank you for sending me the logs. This looks related to #71 (similar situation, but with a USB mouse)--abnormal behavior is triggered by a race condition in vdevd's daemonlet logic. I'm still trying to reproduce #71 on my system, but your report gives me additional information to work with.
OK, I get it now...the use of stat_ata is slowing me down enough to expose the race condition I don't otherwise see.
About the referenced error:
[helpers] ERROR: partition not found: no metadata on parent device of /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/host17/target17:0:0/17:0:0:0/block/sdb/sdb4 (at /dev/metadata//sysfs/\x2fsys\x2f\x2fdevices\x2fpci0000:00\x2f0000:00:1d.0\x2fusb2\x2f2-1\x2f2-1.3\x2f2-1.3:1.0\x2fhost17\x2ftarget17:0:0\x2f17:0:0:0\x2fblock\x2fsdb
I just wanted to add I got two users with the same error and default vdev configuration. Both of them had extended partitions on their HDD but I don't know if that is relevant?
I tinkered a bit with vdev meself and when running disk.act with async=true I get this error. I don't get it with only daemonlet=true though.
@suedi | Both of them had extended partitions on their HDD but I don't know if that is relevant? Don't think so; my partitons are all primary
I found the cause of the race condition.
When a daemonlet helper (in my case stat_ata, but it will also apply to any other) writes to stderr, it somehow causes vdevd to falsely receive a success status, and carry on its way. So changing this line (disk.sh):
HELPER_DATA="$($VDEV_HELPERS/stat_ata "$VDEV_MOUNTPOINT/$VDEV_PATH")"
to
HELPER_DATA="$($VDEV_HELPERS/stat_ata "$VDEV_MOUNTPOINT/$VDEV_PATH" 2>/dev/null)"
makes it all work as expected
Jim.
will try your solution this evening and see if its the same issue with usb devices #71 input.sh
good job hunting this down
Thanks...that line from disk.sh is from the one in the ata_usb_fallback branch. Will you run naked around my block?
Jim
Will you run naked around my block?
Naah odds are not in my favor here, your solution seems to credible.
Anyway I am on the road and forgot that I didn't bring USB mouse so test results will have to wait a little longer.
action.c lines 940-946:
// reroute output back to vdevd. rc = dup2( daemonlet_pipe_stdout[1], STDERR_FILENO ); if( rc < 0 ) { vdev_error_async_safe("dup2 stderr to stdout failed\n"); _exit(1); }
Removing these lines fixes the problem, but maybe that breaks something else? Seems too deliberate.
Naah odds are not in my favor here, your solution seems to credible. Ah well, the life of a nerd...back to my parent's basement then.
Jim
I am on Branch: master
I tried to add 2>/dev/null
to every occurance of HELPER_DATA="$($VDEV_HELPERS/stat_*
in input.sh
Found them on line 94, 115 and 210
This did not seem to help. Maybe I missed some other code-line that possibly could output on stderr???
Then I tried to recompile vdevd with @jimyyz suggestion above about action.c lines 940-946 With those lines put in a comment, plus a reboot, with daemonlet=true in input.act did give me a functional USB mouse
This looks very promising :+1: Would be good to hear what Mr Nelson thinks?
Now I am very glad I didn't commit to run naked around your block. You probably live at the North Pole or something, right?
EDIT With this some new entries cropped up in vdevd.log Nothing really strange but I include for reference
Removing these lines fixes the problem, but maybe that breaks something else? Seems too deliberate.
I think this is precisely the source of the problem. I had originally re-directed a subprocess's stderr back to vdevd so vdevd could capture error messages and write them to the log automatically. This was added well before I added the daemonlet feature. But as you both discovered, this setup also means that an error message from a daemonlet could trick vdevd into erroneously thinking that the daemonlet was finished processing a device event, causing it to send it to the next daemonlet while it was still being processed by the previous one. This in turn created a race condition that both of you observed, but I didn't see it since I was not getting any error messages.
Since vdevd exports the path to the logfile to daemonlets now (it hadn't before), it should be safe to remove action.c lines 940-946. I'll update the helper scripts to explicitly re-direct the stderr stream for programs like stat_ata
to the logfile, at the ERROR
loglevel.
Thank you both for working through this bug! I've been trying to reproduce it for months, but was getting nowhere since none of my hardware was triggering errors.
Yay. This also makes logs much more consistent and nice to read.
@suedi From your log, it looks like either the pvs program has to be installed on your system, or else it is located elsewhere (ie not on /sbin). On debian/devuan, it comes with.
Yeah The pvs problem is known.
Not installed by default on my system.
I have two sata drives in sata-to-usb enclosures for which symlinks are not created in /dev/disk. Tracked this down to disk.sh:
DISK_TYPE is set to 'ata' here, but stat_ata then chokes. When I set DISK_TYPE to 'usb', everything works hunky-dory. Maybe there's no such thing as a non-removable usb drive?
Jim