magnusja / libaums

Open source library to access USB Mass Storage devices on Android without rooting your device
Apache License 2.0
1.26k stars 270 forks source link

Implemented ScsiRequestSense error handling #280

Closed Derpalus closed 3 years ago

Derpalus commented 3 years ago

My attempt at fixing various errors in the current implementation.

Currently, all errors are handled by resetting the device. However, the very act of resetting the device will in pretty much all cases make sure any communication is well and truly destroyed and can never be recovered. This approach meant that the only time communication worked with a device was when there were absolutely no errors what so ever.

This fix does a couple of things:

  1. Implements a request sense call whenever a call has a controlled failure. Then depending on what the error was an exception is thrown which can then be handled in an appropriate way at an appropriate level of the code to resume correct communication.
  2. Removed the automatic reset on failure and instead just re-send the command (if appropriate) or send the exception further up the chain where it can be handled.
  3. Retries a command that has a read or write error, i.e. when bulkTransfer returns -1 (typically it works the second time).

At least two issues that I know of are fixed by this approach; intermittent read/write errors and SD card readers with more than one card slot where one of the slots isn't occupied.

It should also be easy to handle more issues that might arise by adding more specified cases in the SenseParser.kt file and thereby being able to catch and handle problems I haven't found yet.

Fixes #278

codecov[bot] commented 3 years ago

Codecov Report

Merging #280 (367d2ed) into develop (584804f) will decrease coverage by 4.65%. The diff coverage is 0.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop     #280      +/-   ##
=============================================
- Coverage      62.70%   58.04%   -4.66%     
  Complexity       365      365              
=============================================
  Files             49       51       +2     
  Lines           1582     1709     +127     
  Branches         217      227      +10     
=============================================
  Hits             992      992              
- Misses           525      652     +127     
  Partials          65       65              
Impacted Files Coverage Δ Complexity Δ
...a/com/github/mjdev/libaums/UsbMassStorageDevice.kt 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...ithub/mjdev/libaums/driver/scsi/ScsiBlockDevice.kt 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...ibaums/driver/scsi/commands/CommandBlockWrapper.kt 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...ums/driver/scsi/commands/sense/ScsiRequestSense.kt 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...er/scsi/commands/sense/ScsiRequestSenseResponse.kt 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...baums/driver/scsi/commands/sense/SenseException.kt 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...hub/mjdev/libaums/usb/JellyBeanMr2Communication.kt 0.00% <0.00%> (ø) 0.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d829aee...367d2ed. Read the comment docs.

magnusja commented 3 years ago

Hey @Derpalus

first of all, thanks a lot for this amazing contribution! I like that you commented everything thoroughly, definitely helped me understand everything.

I hope you dont mind that I refactored some parts to my personal preference. Please have a look and let me know what you think.

  1. I got rid of the SenseParser and integrated that into the response
  2. When sense command indicates success I am not throwing a Recovered exception anymore. I would consider this as usual behavior and no need for raising an exception.
  3. Removed the write command flag you introduced. I found the terminology confusing in this case. Renamed this to command "without a dataphase".
  4. Some exception which were not sense exceptions strictly speaking, I converted to IOExceptions.

@Depau Do you want to quickly have a look over this? I guess the biggest change is the removal of the reset procedure, we implemented (inspired by TotalCommander usb plugin). But as far as I remember this procedure was not really successful in reestablishing communication with the USB device anyways?

Derpalus commented 3 years ago

Seems like reasonable changes. The only thing I can think of is that in the modified retry-loop in ScsiBlockDevice the continue instruction makes the execution path skip the sleep command. But then again I just added that to be on the safe side, I have no idea if it's needed.

Getting rid of that extra write flag was nice as well.

As for the old retry mechanic that I removed, I must say that I doubt it could ever have done anything useful. It was actually quite harmful in a number of cases for me. For example in the case there was a csw error it would just reset the device which will likely just produce the same csw error the next call. A sense is needed to understand why it happens in the first place. And because the sense wasn't used it was impossible to find out if for example a LUN did not contain a storage device. At that point it doesn't matter how many times you restart, the LUN will continue to be empty and return a csw error, trying to get noticed. Also, in case it was a read or write error a simple retry would have worked fine as well. Instead the device was reset, introducing a more severe error.

magnusja commented 3 years ago

Oh good catch, the continue expression is pretty much useless there anyways, thanks!

Yeah, I am happy to take the resetting logic out for now, we can still add it if it is useful in some case or people start to see problems because of that. Thanks again :)

Derpalus commented 3 years ago

You will have to stay on top of which sense exceptions are getting thrown as well. I have only had my devices to see what they throw and sparse documentation on the internet to guess what might be common and what to do.

If you start seeing issues when other devices are using the library, it should be easy to determine which ASC and ASCQ codes they throw (they are added to the exception message string) and implement reasonable handling of those from the description in the SCSI documentation. No more errno = -1 for every type of error ;)

depau commented 3 years ago

@Depau Do you want to quickly have a look over this? I guess the biggest change is the removal of the reset procedure, we implemented (inspired by TotalCommander usb plugin).

Yes sorry, I've got quite a bit going on and I didn't get the chance to read through the big diff. I will check it now though.

But as far as I remember this procedure was not really successful in reestablishing communication with the USB device anyways?

As far as I can remember (but I don't really trust my memory so it may be inaccurate) the reset is required in case of EPIPE. It does not help in the other cases (in my experience) except for some rare circumstances in which it did recover. Definitely I'd say it won't help if libinput returns -1.

I am going to temporarily merge this into the ci-instrumented-tests (edit: results) branch to see what we get from the CI. Speaking of that, I will try to finish it within the next few months. It does work for the most part which is why I'd like to see how it does with these changes.

EDIT:

The CI tests worked on Oreo and Pie, on Marshmallow they failed but it's the usual race condition. https://objstor.depau.eu/minio/libaums-screenrecs/travis/970/

Derpalus commented 3 years ago

The sense exceptions that get "leaked" up to the user should only be the kind of stuff that can't really be handled by the library, such as "write protected media", "storage full", "hardware error", "power cycle required" and stuff like that. That really must be handled by the user to be resolved. Most of it should reasonably happen during initialization however. There is also the "media not inserted" that IS handled a bit higher up in the library.

It's also possible, as we learn more about the sense results from different devices, that we can handle more issues automatically in the library.

As for handling NotReadyTryAgain and InitRequired in the read() and write() functions I'm not sure that works. I kind of try to think about what would cause an InitRequired to pop up during a read operation. It seems like if something like that happens something more severe has happened and that it's unlikely the library can fix that by just trying to read again (maybe the user disconnected the device for example). You would probably have to re-init the device, but after that you can't re-start the read again as that might have been in the middle of a file or something, we have no idea what that was about so we can't transparently resume. Therefore it's better to send it up to the user to handle. Alternatively have the library be much more advanced; taking care of its own lifecycle in some way so it can re-init itself and restart a file download or a string of commands.

It's also possible the commands higher up in the library that use the read()and write() functions are the ones that should handle this. They after all know more about the context than the functions themselves.

But these are things you sort of have to have happen first to figure out what to do or even if it CAN happen during a read() or write(). Anything else is just speculation and adding error handling that you don't know if it does the right thing might break more than it helps, better to get the exception noticed and reported if it happens.

Derpalus commented 3 years ago

The EPIPE case is interesting though as you say @Depau. I wasn't involved in that so I don't know what that is about. If you have access to a device that has that issue it would be interesting to see how this change affects that and if something has to be adjusted to handle that.

depau commented 3 years ago

The EPIPE case is interesting though as you say @Depau. I wasn't involved in that so I don't know what that is about. If you have access to a device that has that issue it would be interesting to see how this change affects that and if something has to be adjusted to handle that.

This is where I learned about it: https://elixir.bootlin.com/linux/latest/source/drivers/usb/storage/transport.c#L274

The action itself is still being performed. IDK if we should restrict it to exactly EPIPE or perform it for all errors "just in case". I don't think it can do any damage other than reducing performance, but the error will reduce performance regardless, so whatever.

Derpalus commented 3 years ago

Which function in this code is it that can return EPIPE?

magnusja commented 3 years ago

Sorry for leaving this open that long. I addressed @Depau's comments and catched the NotReadyTryAgain and the InitRequired exceptions. In addition, I added back the reset procedure for EPIPE.

I will release a new version soon!

Derpalus commented 3 years ago

I was just about to start on the app that is using this library again as some higher prio work got finished, so the timing is very good.

depau commented 3 years ago

I noticed you released libaums after merging this but you didn't release libusbcommunication. Is that intended or you forgot? :)

Btw, happy new year!

magnusja commented 3 years ago

Fair point, just did that :)

Happy new year to you too!