jcnelson / vdev

A device-file manager for *nix
GNU General Public License v3.0
101 stars 13 forks source link

symlinks not created for usb hard drives #81

Closed Jamessall closed 8 years ago

Jamessall commented 8 years ago

I have two sata drives in sata-to-usb enclosures for which symlinks are not created in /dev/disk. Tracked this down to disk.sh:

    # maybe a usb device instead?
    if [ -z "$DISK_TYPE" ] && [ -n "$(vdev_subsystems "$SYSFS_PATH" | /bin/grep "usb")" ]; then 

       # part of a USB device
      REMOVABLE_FILE="$SYSFS_PATH/removable"

       if ! [ -f "$REMOVABLE_FILE" ] || [ -n "$(/bin/fgrep "0" "$REMOVABLE_FILE")" ]; then 

          # non-removable USB mass storage in a (S)ATA enclosure 
          DISK_TYPE="ata"

       else

          # usb mass-removable storage disk 
          DISK_TYPE="usb"

       fi
    fi

DISK_TYPE is set to 'ata' here, but stat_ata then chokes. When I set DISK_TYPE to 'usb', everything works hunky-dory. Maybe there's no such thing as a non-removable usb drive?

Jim

jcnelson commented 8 years ago

Hi Jim, thank you for tracing this problem.

I've pushed a possible fix to the branch ata_fallback_usb (f367dce7402e073cf78808f633fe17695892c8d8). If you get a chance, could you give it a try? The new code tries to probe the disk with stat_ata, and falls back to stat_usb if it fails.

Thanks!

Jamessall commented 8 years ago

Thanks, Jude, but it doesn't work. The reason is that stat_ata on the usb drive takes a very long time to return it's error status...so long that disk.sh gets invoked for every partition before the first invocation finishes. Maybe need to run the daemonlet synchronously (ugh). I'll have a look at stat_ata to see if I can suss out the reason.

Another curious thing: I notice by going through this that disk.sh gets executed twice for the drive (sdb) and twice again for each of the partitions. Bug?

Jim

jcnelson commented 8 years ago

so long that disk.sh gets invoked for every partition before the first invocation finishes.

That's the expected behavior. With the default configuration, disk.sh handles any device whose DEVTYPE is disk or partition.

Another curious thing: I notice by going through this that disk.sh gets executed twice for the drive (sdb) and twice again for each of the partitions. Bug?

Not sure. Can you provide a listing from the logfile, with vdevd running with debug mode enabled? To do so, either set loglevel=debug in /etc/vdev/vdevd.conf, or pass -v2 on the command-line. Disk devices usually have multiple entries in /sys, but I haven't yet seen a case where a single disk or partition generates multiple uevents with DEVTYPE=disk.

Jamessall commented 8 years ago

Sorry, I didn't explain that very well. What I meant is that the parent metadata has not yet been created when disk.sh is run for the partitions, resulting in

[helpers] ERROR: partition not found: no metadata on parent device of /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/host17/target17:0:0/17:0:0:0/block/sdb/sdb4 (at /dev/metadata//sysfs/\x2fsys\x2f\x2fdevices\x2fpci0000:00\x2f0000:00:1d.0\x2fusb2\x2f2-1\x2f2-1.3\x2f2-1.3:1.0\x2fhost17\x2ftarget17:0:0\x2f17:0:0:0\x2fblock\x2fsdb)

Anyway here's the log, snipped to just before plugging the drive. Also some of my own debug messages in there vdev.log.txt

And to confirm that it is a race condition, I ran it with daemonlet=false for disk.sh. It works: vdevd.log.nodaemonlet.txt

Jim

jcnelson commented 8 years ago

Thank you for sending me the logs. This looks related to #71 (similar situation, but with a USB mouse)--abnormal behavior is triggered by a race condition in vdevd's daemonlet logic. I'm still trying to reproduce #71 on my system, but your report gives me additional information to work with.

Jamessall commented 8 years ago

OK, I get it now...the use of stat_ata is slowing me down enough to expose the race condition I don't otherwise see.

suedi commented 8 years ago

About the referenced error:

[helpers] ERROR: partition not found: no metadata on parent device of /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/host17/target17:0:0/17:0:0:0/block/sdb/sdb4 (at /dev/metadata//sysfs/\x2fsys\x2f\x2fdevices\x2fpci0000:00\x2f0000:00:1d.0\x2fusb2\x2f2-1\x2f2-1.3\x2f2-1.3:1.0\x2fhost17\x2ftarget17:0:0\x2f17:0:0:0\x2fblock\x2fsdb

I just wanted to add I got two users with the same error and default vdev configuration. Both of them had extended partitions on their HDD but I don't know if that is relevant?

I tinkered a bit with vdev meself and when running disk.act with async=true I get this error. I don't get it with only daemonlet=true though.

Jamessall commented 8 years ago

@suedi | Both of them had extended partitions on their HDD but I don't know if that is relevant? Don't think so; my partitons are all primary

Jamessall commented 8 years ago

I found the cause of the race condition.

When a daemonlet helper (in my case stat_ata, but it will also apply to any other) writes to stderr, it somehow causes vdevd to falsely receive a success status, and carry on its way. So changing this line (disk.sh):

HELPER_DATA="$($VDEV_HELPERS/stat_ata "$VDEV_MOUNTPOINT/$VDEV_PATH")"

to

HELPER_DATA="$($VDEV_HELPERS/stat_ata "$VDEV_MOUNTPOINT/$VDEV_PATH" 2>/dev/null)"

makes it all work as expected

Jim.

suedi commented 8 years ago

will try your solution this evening and see if its the same issue with usb devices #71 input.sh

good job hunting this down

Jamessall commented 8 years ago

Thanks...that line from disk.sh is from the one in the ata_usb_fallback branch. Will you run naked around my block?

Jim

suedi commented 8 years ago

Will you run naked around my block?

Naah odds are not in my favor here, your solution seems to credible.

Anyway I am on the road and forgot that I didn't bring USB mouse so test results will have to wait a little longer.

Jamessall commented 8 years ago

action.c lines 940-946:

 // reroute output back to vdevd.
 rc = dup2( daemonlet_pipe_stdout[1], STDERR_FILENO );
 if( rc < 0 ) {

    vdev_error_async_safe("dup2 stderr to stdout failed\n");
    _exit(1);
 }

Removing these lines fixes the problem, but maybe that breaks something else? Seems too deliberate.

Naah odds are not in my favor here, your solution seems to credible. Ah well, the life of a nerd...back to my parent's basement then.

Jim

suedi commented 8 years ago

I am on Branch: master

I tried to add 2>/dev/null to every occurance of HELPER_DATA="$($VDEV_HELPERS/stat_* in input.sh

Found them on line 94, 115 and 210

This did not seem to help. Maybe I missed some other code-line that possibly could output on stderr???

Then I tried to recompile vdevd with @jimyyz suggestion above about action.c lines 940-946 With those lines put in a comment, plus a reboot, with daemonlet=true in input.act did give me a functional USB mouse

This looks very promising :+1: Would be good to hear what Mr Nelson thinks?

Now I am very glad I didn't commit to run naked around your block. You probably live at the North Pole or something, right?

EDIT With this some new entries cropped up in vdevd.log Nothing really strange but I include for reference

https://gist.github.com/suedi/026411e33ee1b6397ddb

jcnelson commented 8 years ago

Removing these lines fixes the problem, but maybe that breaks something else? Seems too deliberate.

I think this is precisely the source of the problem. I had originally re-directed a subprocess's stderr back to vdevd so vdevd could capture error messages and write them to the log automatically. This was added well before I added the daemonlet feature. But as you both discovered, this setup also means that an error message from a daemonlet could trick vdevd into erroneously thinking that the daemonlet was finished processing a device event, causing it to send it to the next daemonlet while it was still being processed by the previous one. This in turn created a race condition that both of you observed, but I didn't see it since I was not getting any error messages.

Since vdevd exports the path to the logfile to daemonlets now (it hadn't before), it should be safe to remove action.c lines 940-946. I'll update the helper scripts to explicitly re-direct the stderr stream for programs like stat_ata to the logfile, at the ERROR loglevel.

Thank you both for working through this bug! I've been trying to reproduce it for months, but was getting nowhere since none of my hardware was triggering errors.

fbt commented 8 years ago

Yay. This also makes logs much more consistent and nice to read.

Jamessall commented 8 years ago

@suedi From your log, it looks like either the pvs program has to be installed on your system, or else it is located elsewhere (ie not on /sbin). On debian/devuan, it comes with.

suedi commented 8 years ago

Yeah The pvs problem is known.

Not installed by default on my system.