Closed pakiessling closed 7 months ago
Hello @pakiessling,
Indeed, the image writing can be long, this is why we start writing the image very soon in the pipeline: this way, we perform the other steps in parallel and finally update the remaining explorer files.
In 12 hours, did it at least write the first image scale?
Note that we have two writing modes: a lazy-mode that is memory efficient but slower, and one mode that is faster because we load the image in memory. Depending on your RAM capacity, we automatically decide which mode we use. Therefore, asking for more RAM will load the image in memory only when possible, speeding up the writing process for small subscales. But it will probably not solve the writing of the highest resolution image, which probably can't be loaded in memory even if you set a high RAM.
I'll try to see if it's possible to parallelize the writing process in "lazy-mode"!
It took ~ 9 hours for the first level and than ~3 for 35% of the 2nd.
I will try with a more generous time limit and more RAM. Would it help to convert the images to OME-TIFF before?
Okay, it's great that at least it converts the first level. For the second level, I think you'll need about 116GB of RAM according to your image size, which is a lot... I don't know how much RAM you can ask for, but if possible can you try with 128GB of RAM? At least, starting from the second level, it should load the image in memory and make it faster. If it works, can you tell me how much RAM it used and how long it was?
No the conversion to OME-TIFF will not be helpful, except if it's already the format expected by the Xenium Explorer. Which conversion would you use?
Our nodes max out at 180 GB, so that should be fine.
Vizgen has a utility for converting images here https://github.com/Vizgen/vizgen-postprocessing/blob/166fb247d235e021d26205a74ce3814f885aee4b/src/vpt/convert_to_ome/main.py#L15
Do I need to set a a higher ram-threshold? I think the default is 4?
Ok great! Yes, sorry I forgot to mention, you indeed have to change the value of ram_threshold_gb
(for instance, 128GB). If you use the pipeline, you can update the parameter here directly.
For the Vizgen conversion, I had a quick look, but the compression parameters are different from what the Xenium Explorer expects, so I think that it's unlikely to work...
I found a way to parallelize the writing, but I think the bottleneck comes from reading the chunks. Maybe using a chunk size of 1024
will be much faster (currently, we read chunks of size 4096 for merscope data). I'll do some tests next week to find the bottleneck and improve it!
Thanks, sounds great!
Ah btw, for the .yml I think intensity_mean
got changed to average-intensities
at some point
Thanks for the catch @pakiessling, it seems that I updated this everywhere except in the example_config.yaml
file! I'll update it
Hello @pakiessling,
After a quick investigation, I confirm that the bottleneck was due to loading the chunks (not writing the image). Indeed, the Xenium Explorer chunk size is (1, 1024, 1024)
, but by default the MERSCOPE data is saved with chunks of size (1, 4096, 4096)
. So, each time we write a chunk, we load a chunk that is 16 times bigger than what we need.
Now, the default chunksize
is set to (1, 1024, 1024)
, and image-writing should now be about 5 times faster.
If you want to test it already, you can use the dev
branch, else you can also wait for the release of version 1.0.5
(I'm waiting for more novel features before releasing a new version). Note that you'll need to write again the .zarr
directories, else it will still use the previous chunk size of (1, 4096, 4096)
.
Let me know if it's better now!
That sound amazing. I will give it a shot!
Hello again @pakiessling, I made again a recent update to make the image writing faster, the latest changes are on dev
Sorry for the quick change, I hope you haven't tried my previous changes yet...
I'm still performing some tests, so dev
might again have some changes, but the version sopa==1.0.5
will be stable when I will release it
Version 1.0.5
is released, if you want to test it out
Cool, Im running it right now with default settings for the explorer step. Will report back how long it took.
Great, let me know!
Have you overwritten your spatialdata .zarr
directory? You need to create it again, because else you'll still have the old chunk size, which was the main reason for latency.
If you are not sure, just check that the image chunk size is indeed (C, 1024, 1024)
Yes I am rerunning from scratch.
Clueless question, but what does the writing of the tiles actually do? Is it just to be able to look at it with the Xeniume explorer?
If I load in a Merfish dataset with spatialdata.io and save as .zarr it is very fast for example.
Yes, this is only used to open the results in the Xenium Explorer: all it does is create a new image with the metadata/chunks/subscales/compression intended by the Xenium Explorer.
Now the image writing should be as fast as the .zarr
writing, hopefully :)
Ok it ran through in ~ 6 hours now. If the process is started at the beginning of the pipeline this should not be a bottle neck anymore. Very nice!
Great, I'm glad to hear this, thanks for your feedback!
Hi, I am working my way through the Sopa Snakemake pipeline with one of our Merfish datasets.
A step that has been problematic is the
sopa explorer write
.This takes a long time and the job was canceled after 12 hours.
I think the problem is that we have 11 stains all in all at 87012 x 64791 pixels. The writing of tiles takes forever.
Is there a way to paralellize this or maybe only take a subset of stains?
Thanks!