Closed constantinpape closed 11 months ago
I worked on things like this. As long as all (i) the information can be parsed from the file and folder naming scheme and (ii) the individual TIFF files can be opened in Fiji it should be no problem.
Can you please share a minimal example data set with me?
I have put some example data in this format on the EMBL S3:
i2k-2020/incu-test-data/2207/19
. (Contains two timepoints with a few wells, 25 positions per well and 2 channels).
This is the minimal example data. On the EMBL cluster you can copy it via mc cp -r embl/i2k-2020/incu-test-data/2207/19 .
But I will also share an owncloud link with you to make it easier.
I send a download link via EMBL Chat @tischi
This appears to be multi-resolution data:
I am not sure we can handle this from a TIFF, but maybe we can, I am not sure, I will have a look.
@constantinpape would it also help you if that data could be conveniently converted to an OME-Zarr plate?
This may work to open TIFF from S3, using our current mobie-io code base:
InputStream inputStream = IOHelper.getInputStream( s3address );
ImagePlus imagePlus = ( new Opener() ).openTiff( inputStream, "name" );
I found that we have something in MoBIE already for Incuyte, but is seems to be a different variant:
/*
example:
MiaPaCa2-PhaseOriginal_A2_1_03d06h40m.tif
well = A2, site = 1, frame = 03d06h40m
*/
private static final String INCUCYTE = ".*_(?<"+WELL+">[A-Z]{1}[0-9]{1,2})_(?<"+SITE+">[0-9]{1,2})_(?<"+ T +">[0-9]{2}d[0-9]{2}h[0-9]{2}m).tif$";
Do you have any insights here?
Hi @tischi , thanks for looking into this so fast. Regarding the questions:
would it also help you if that data could be conveniently converted to an OME-Zarr plate?
That would be nice, but not a high priority. We have too much data to convert everything, and in order to keep things compatible with other software we (for now) have to keep a copy in the original format.This may work to open TIFF from S3, using our current mobie-io code base
Ok, good to know! (But I suggest we first figure out how to parse the format in principle and then how to also load from S3.)I found that we have something in MoBIE already for Incuyte, but is seems to be a different variant
. Yes, the data is exported from the microscope in a different format (the one you have) to how it is stored. We now have a lot of data and cannot export all of it (because this would mean duplicating the data, and also there isn't a good programmatic way for it). That's why we want to access the 'storage incucyte format'.Does that mean that IncuCyteRaw
would be a good name for this?
Yes, that would be good!
Are these multiple plates (the example data)?
I am assuming this is one plate?
incu-test-data/2207/19/1110/262
It's a single plate, but imaged for multiple timepoints:
incu-test-data/2207/19/1110/262
is one timepoint (imaged on the 19.07.2022 at 11:10; 262 is the experiment id)incu-test-data/2207/19/1120/262
is another timepoint (imaged on the 19.07.2022 at 11:20)Ok man...Ok, I guess I could parse this such that the timepoints are correct.
Does that look OK?
Does that look OK?
Looks correct on first glance. (To make sure I would need to load it myself and compare with individual images loaded in napari or Fiji)
Turns out that the distribution of time points across multiple files is as challenge here.
I am having code for this, but currently using the VirtualStack
from ImageJ to both concatenate and lazy load the time points for different files. The issue now is this only works if the files can be opened with the ImageJ1 Opener
, because this is what VirtualStack
uses to load data. This does not work here because we need Bio-Formats. In addition, using this approach we will not be able to make use of the resolution pyramid.
If we do not care about the resolution pyramid, we could implement a modified version of the VirtualStack
that uses Bio-Formats to open the files instead of the ImageJ1 Opener class.
Another potential avenue that would preserve the resolution pyramid: https://github.com/BIOP/bigdataviewer-image-loaders/issues/22
@constantinpape using the above VirtualStack approach this works now also for the timepoints (without the pyramid). The current main
branch should work, you can use this function to testing. Can you test this from the branch (preferred) or shall I release it to Fiji?
It would be good to know if this is usable enough for a whole plate on disk, because I think from S3 it will only be worse.
Thanks @tischi , I will test it from the branch on Sunday or Monday.
@constantinpape
I managed to also implement it for S3 🥳 .
You can try it in the same function that I linked to above.
Notes:
Some of the above limitations could be probably be improved, but this would probably require upstream contributions from @nicokiaru in bigdataviewer-image-loaders; that is we would need to see whether the memory mapping trick for BioFormats to load objects from S3 (see discussion here and implementation here) could be implemented in bigdataviewer-image-loaders. This would give us the possibility to make use of the resolution pyramid and probably also better memory management.
Hi @tischi ,
I tested it now, and it works really well! The loading speeds are good both when loading the data from local files and from s3. I didn't try to zoom to the full plate level with s3, but hopping between wells was working quite well (with some loading delays, but still usable).
I think on the technical level that is all we would need for now; for better performance we may need to convert selected data to ome-zarr; but it's already great to have the functionality as it is to quickly check the uploaded data.
There are two more things that would be helpful in addition:
I would suggest to first go ahead and merge the current changes and then I can lay this out in more detail in a separate issue.
I released it.
Hi @tischi , we would like to read HCS data from Incucyte microscopes.
The corresponding data is stored over multiple folders, where each folder contains the data for a given timepoint. E.g.
Here,
B2
is the well id,1
the position in the well andPh
(phase contrast) andCh1
(a fluorescence channel) are the channel names.vessel_id
is a unique identifier for all experiments stored on the microscope.I have put some example data in this format on the EMBL S3:
i2k-2020/incu-test-data/2207/19
. (Contains two timepoints with a few wells, 25 positions per well and 2 channels).What do you think would be the best way to open this data via the MoBIE HCS loader?