Closed carandraug closed 6 years ago
A side effect of this expansion is that files that would be small may suddenly become very large.
What softworx does:
On the OMX v2, the data is always acquired in the order that it should be in the file so this is not a problem there. If an experiment is aborted, the file is truncated.
On the OMX v3 the data is acquired in a different order than appears in the file. The way this is solved in the V3 is that a large file is created first, of the expected final size, and as the images are acquired the program writes the image panel in the correct file location. If the experiment is aborted, the file is already of the right size, with the missing planes already full of zeros. There is no note or flag that the experimented has been aborted and that some of the image planes were not actually acquired.
Back to our problem:
Having to pad the image with "blanks" seems a bit mad. However, if an experiment does not conclude we still want to make any collected data useful which means writing the data in the correct file format. In the case of dv files, writing data in the correct format means reordering the z, angle, and phase dimensions. Since data in dv files are hyperrectangles, if an experiment did not conclude, we need blank planes for padding. Short of using another format, doesn't look like we have other option. I see two issues remaining:
the dv format supports different orders for the W, Z, and T dimensions, some of which do not have time as the slowest changing. Theoretically, this should mean that we would also have to pad along the time dimension. However, cockpit only saves data in the WZT order so we won't actually have to. I have spoke with both Mick and Ian so we will pad for the last time point with any data.
it is possible to have cockpit acquire data in same order that it should appear in the file. In that case, there would be no need to do any padding (this is what happens in the OMXv2). However, I think it's weird that cockpit would pad the file or not dependent on that so I'm thinking that we should always pad. However, another way to think about it is not whether to pad the image, but about doing the least amount of padding required which may be much less than what is required for a complete time point. Doing the least amount of padding is a bit more work for us and the user ends up with a file that is less useful (or the reconstruction software will have to handle with the missing padding). Because of that, I'm thinking we should always pad for the complete time point.
I think this is fixed with 876cfee0ac Please review and merge:
## to try it out
git checkout -b carandraug-reorder-truncatedz master
git pull git@github.com:carandraug/cockpit.git padding-data
python -m unittest discover
## to merge
git checkout master
git merge --ff-only carandraug-reorder-truncatedz
git push upstream master
I tracked this issue further down into the Mrc module. In case of truncated data, that module would only make available valid data for "rectangles" of the slowest changing dimension. For example, if there was meant to be (t=5, z=10, w=2, y=512, x=512)
worth of data and there was one full time point and part of a second time point, then only the first time point would be available. However, if there was only some z slices from the first time point, 3 for example, since there was no complete time point (the slowest dimension), then it would return an empty array instead of (1, 3, 2, 512, 512)
. This changes the Mrc module to pad with 0 or NaN as required to make all the data available on the Mrc object. It inserts as little padding as is required to make all the data in file available.
This may require load the data in memory since it can't pad values to a memmaped file. Since numpy.memmap
is not a true subclass of numpy.ndarray
, they behave different and assigning to __class__
directly no longer worked. Changed to make use views
instead.
Finally, if there was never a full Z stack in the experiment, it required more padding so that is also done on the structuredIllimunation module although it reuses the logic previously added in Mrc.
Also started adding some test units.
Carnë Draug notifications@github.com writes:
2 it is possible to have cockpit acquire data in same order that it should appear in the file. In that case, there would be no need to do any padding (this is what happens in the OMXv2). However, I think it's
This is possible, but not sensible. The V2 image order was selected because the angle rotation was very slow so it is the obvious order to collect data. However, Z is our slowest dimension in all our other systems, and it is also massively beneficial to collect all the data in a single Z plane as close together as possible to reduce motion or drift artifacts.
weird that cockpit would pad the file or not dependent on that so I'm thinking that we should always pad. However, another way to think about it is not whether to pad the image, but about doing the least amount of padding required which may be much less than what is required for a complete time point. Doing the least amount of padding is a bit more work for us and the user ends up with a file that is less useful (or the reconstruction software will have to handle with the missing padding). Because of that, I'm thinking we should always pad for the complete time point.
Yes I agree, padding for a complete time point simplifies the code and the extra chunk of zeros is going to be insignificant in the grand scheme of things.
Ian
Pushed 7c2b287f1f and 876cfee0a after speaking with Mick. Closing as fixed.
At the end of an SIM experiment, the plane images inside the DV file are reordered. If an experiment is aborted, then the reordering fails because the number of plane images does not match the expected.
We still want to have the file, even if the experiment is aborted. What we want may be case dependent. For example, according to Ian, if it's a time series experiment and it's aborted half way through a late time point, we would want all the complete time points and discard the incomplete time point. If it's a single time point, we would want the complete Z stacks.
What to keep in such cases may be very case specific so I'm thinking it may be better to fill the missing values and generate a dv of the original expected size (also, what does softworx does in this case?). The fill value would be zero for integer images, or NaN for floating.