mobie / mobie-viewer-fiji

BSD 2-Clause "Simplified" License
29 stars 12 forks source link

Optimize loading from S3 #677

Closed tischi closed 2 years ago

tischi commented 2 years ago

Bottlenecks:

MergedGridSource

MergedGridSource: this.type = Util.getTypeFromInterval( referenceSource.getSource( 0, 0 ) );

image

getMask()

MoBIEHelper: getMask():

image
tischi commented 2 years ago

Ideas:

  1. Add field for allSourceSameDimensions for the GridSources, and then do not load all of them.
  2. Already touch the sources during the loading such that the information is cached in a multi-threaded way.
tischi commented 2 years ago

Adding sourceAndConverter.getSpimSource().getSource( 0,0 ) ) to the initial loading seems to help a lot later and does not seem to harm too much while doing it.

Adding however Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) ); to the initial opening also seems to help later, but also takes a lot longer during the initial loading. I think here the point is that we do not need to do this for all sources.

Question to @KateMoreva : Can we get possibly get rid of the "doesObjectExist()" call, which appears to take halve of the time?

image
tischi commented 2 years ago

See here: https://github.com/mobie/mobie-io/issues/91

constantinpape commented 2 years ago

Adding however Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) ); to the initial opening also seems to help later, but also takes a lot longer during the initial loading. I think here the point is that we do not need to do this for all sources.

Is this to get the data type? If so, I think we probably only need to do this once, data types shouldn't be mixed in the same grid...

Question to @KateMoreva : Can we get possibly get rid of the "doesObjectExist()" call, which appears to take halve of the time?

Yeah, I think it's better to do EAFP (easier to ask for permission) rather than LBYL (look before you leap) here to improve performance when accessing things on s3.

tischi commented 2 years ago

Is this to get the data type? If so, I think we probably only need to do this once, data types shouldn't be mixed in the same grid...

I think I do it only once, but already that is very slow. This is a recurring issue and maybe something will change here at some point: https://github.com/imglib/imglib2/pull/312. I'll keep on debugging here as much as I can.

I am making very good progress on other fronts, though. Things should get significantly faster now.

constantinpape commented 2 years ago

I think I do it only once, but already that is very slow. This is a recurring issue and maybe something will change here at some point: imglib/imglib2#312. I'll keep on debugging here as much as I can.

If you just want to get the datatype you can also get it from the metadata, e.g. the .zattrs. That is probably much faster than doing it the imglib way.

tischi commented 2 years ago

If you just want to get the datatype you can also get it from the metadata, e.g. the .zattrs

Very good point... The issue with imglib2 is that it actually needs an access to the image data, which loads a whole bunch of pixels, which are not needed.

But I am not sure how to implement your idea, because I need the imglib2 type variable in the end! I will have a think...

constantinpape commented 2 years ago

But I am not sure how to implement your idea, because I need the imglib2 type variable in the end! I will have a think...

You'll probably need to write a function that translate the type you get from the metadata (which will be a string?!) to the corresponding imglib2 type.

tischi commented 2 years ago

Turns out that

Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) );

was nonsense, because there is:

sourceAndConverter.getSpimSource().getType()

which apparently is already initialized and thus returns almost instantaneous instead of taking 500 ms 🥳 .

tischi commented 2 years ago

To better see what is happening, I worked a lot on the logging as well.

I also introduced a variable (currently 100ms), and I am only logging what takes longer than this, in order not to clutter the log window, makes sense?

Based on the few measurements below (this is from home, for my Mac, which has 4 CPU) I think I will keep 16 IO threads for now as the default.

Clearly, multi-threading during IO helps.

Any ideas how to do this more systematically?

Should we expose the numIOThreads in the UI?

1 IO threads

# MoBIE

Opening project: https://github.com/mobie/covid-if-project

Opening view: default

Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (13/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0005.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (21/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0001.ome.zarr
Opening (25/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0004.ome.zarr
Opening (29/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0001.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (37/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0005.ome.zarr
Opening (41/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0007.ome.zarr
Opening (45/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0005.ome.zarr
Opening (49/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (53/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE07_0006.ome.zarr
Opening (57/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE06_0004.ome.zarr
Opening (61/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE06_0008.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opening (69/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE06_0000.ome.zarr
Opening (73/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE06_0004.ome.zarr
Opening (77/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0005.ome.zarr
Opening (81/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0001.ome.zarr
Opening (85/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0003.ome.zarr
Opening (89/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0007.ome.zarr
Opened 90 image(s)  in 108946 ms, using 1 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0004/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0002/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0006/default.tsv
Fetched 18 tables(s)  in 7114 ms, using 1 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0002/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0006/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0003/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 6821 ms, using 1 thread(s).

Opened view: default, in 124170 ms.

4 IO threads

# MoBIE

Opening project: https://github.com/mobie/covid-if-project

Opening view: default

Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (6/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (50/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opening (81/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0001.ome.zarr
Opened 90 image(s)  in 30796 ms, using 4 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0007/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0000/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 958 ms, using 4 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0000/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0007/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0007/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0008/default.tsv
Fetched 18 tables(s)  in 683 ms, using 4 thread(s).

Opened view: default, in 33467 ms.

16 IO threads

# MoBIE

Opening project: https://github.com/mobie/covid-if-project

Opening view: default

Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (18/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opened 90 image(s)  in 12171 ms, using 16 thread(s).

Opening (2/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0005/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0000/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0006/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 1143 ms, using 16 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0003/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0002/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0005/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0000/default.tsv
Fetched 18 tables(s)  in 900 ms, using 16 thread(s).

Opened view: default, in 15195 ms.

32 IO threads

# MoBIE

Opening project: https://github.com/mobie/covid-if-project

Opening view: default

Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (34/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (49/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opened 90 image(s)  in 10863 ms, using 32 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0006/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0002/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0008/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0008/default.tsv
Fetched 18 tables(s)  in 916 ms, using 32 thread(s).

Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0005/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0004/default.tsv
Fetched 18 tables(s)  in 1316 ms, using 32 thread(s).

Read in 117 ms: https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/sites/default.tsv

Opened view: default, in 14683 ms.
constantinpape commented 2 years ago

I also introduced a variable (currently 100ms), and I am only logging what takes longer than this, in order not to clutter the log window, makes sense?

Yeah, I think that makes sense.

Based on the few measurements below (this is from home, for my Mac, which has 4 CPU) I think I will keep 16 IO threads for now as the default.

16 threads sounds reasonable; in any case, I think it makes sense to have more threads than cores here since these things are usually not compute bound. I am not sure if there's a better way to determine the number of threads, so I think 16 sounds ok for now.

constantinpape commented 2 years ago

I checked with the new MoBIE-beta now and it works much better already. I can load the full grid from s3 now in a reasonable amount of time.

tischi commented 2 years ago

I am closing this as I think we exhausted the options discussed in this issue. There is probably more we can do but I would rather open a new issue.