ni / rad

NI Replication and Deployment Utility
MIT License
10 stars 15 forks source link

Error when creating image from RT Linux cRIO with full hard drive #8

Open clendon-gibson opened 5 years ago

clendon-gibson commented 5 years ago

If you try to retrieve an image from a 9038 or 9049 when the / file system is 100% full, you will get a pop up with a unhelpful error message. cRIO Hard Drive Full Error Message

I think this happens if you deploy to a full cRIO as well.

Maybe if this error is detected a file system check can be done, and if full, a more informative error message provided.

This error happens in the Create System Image (Folder).vi that is called from rad_Retrieve Image Wrapper.vi.

SparkingSpork commented 5 years ago

So, the image failing is expected with a full Linux Real-Time target. As part of the get/set image processes, a compressed copy of the files are either created on the system or transferred to the system. That is, the error is returned when the compressed copy cannot be created or extracted due to disk space limitations. This is the actual description for that error code: image

Would you say that error message is more clear as to the problem? If so, I would say that it's better to say the issue is with the error handling in that VI rather than the error itself. Either way, this could be easily modified in the rad_Retrieve Application Image From Target.vi in the "Cleanup" state. There are two options I can think of off the top of my head:

  1. Modify the error case structure in the Cleanup state to have a specific error message instead of just the error code and source.
  2. Modify that VI to use the General Error Handler with the "no dialog" option to generate the message for the Default case
clendon-gibson commented 5 years ago

That error message is better, but you would need experience to know what might cause that. For one thing, which file system is causing the problem, the cRIO or the PC where the image is going?

A 3rd option is to set up a precondition for deploy and retrieve. If the cRIO filesystem is full, don't deploy or retrieve, but do put up a pop-up about why the action is halted.

I think a precondition would be the cleanest solution. You add a state to perform the check. Failed checks go to the cleanup state.

After that I would pick 2 because I prefer to handle errors close to where they occur. Cleanup error handling should be the unhandled error case. It makes sure we don't completely drop the ball, but doesn't have a lot of code to do specific things.

SparkingSpork commented 5 years ago

The 3rd option is actually pretty non-trivial in my opinion. There's not a good way to define a consistent threshold for when to show that pop-up and halt and simply limiting to "full" systems means others will still see the error without context. The % disk capacity on device might be able to take an image at might not work for others, and black-listing of files and directories further complicates it.

I think the better 3rd option would be a warning that allows the user to continue.

Either way, tagging this as a bug for future review.

marshallmassengill commented 4 years ago

I ran into this with the RoboRIO as well a couple years back: https://forums.ni.com/t5/FIRST-Robotics-Competition/Using-RAD-on-the-RoboRIO/td-p/3799329?profile.language=en

I'm adding a link to the NI forum post purely for reference.

If I were to make a suggestion, finding a way to use something like netcat or similar to open a stream for the file data to be sent over the network line instead of trying to create the image locally might help with resolving this bug. From a system imaging architecture, I find it a bit perplexing that the image for an embedded system is being created (and compressed) locally on that system prior to shipping it over the network but maybe there is a reason for it.