seung-lab / igneous

Scalable Neuroglancer compatible Downsampling, Meshing, Skeletonizing, Contrast Normalization, Transfers and more.
GNU General Public License v3.0
43 stars 17 forks source link

Data generated with `igneous image create` loads slowly in NG #169

Closed jakobtroidl closed 7 months ago

jakobtroidl commented 7 months ago

I have a .npy array with dimensions 663, 2048, 2048 that I want to convert to NG precomputed using igeneous. I used

igneous image create my-data.npy ./precomputed --compress none --resolution 663,2048,2048

However, when I load the precomputed image in NG, the data loads really, really slowly (I am happy to share a link privately). I assume that is because the image pyramid is not being computed. How could I fix that?

Also, when I load a precomputed volume that used the --encoding fpzip flag, neuroglancer says Error parsing "scales" property: Error parsing "encoding" property: Invalid enum value: "fpzip"

william-silversmith commented 7 months ago

Hi Jakob!

To generate the image period, you can use igneous image downsample. You may also wish to select a chunk size to ensure that the files are reasonably sized for your data.

To visualize fpzip, you need to use my special version of Neuroglancer that has additional codecs loaded:

https://allcodecs-dot-neuromancer-seung-import.appspot.com/

The codecs requiring the special viewer are listed on the CloudVolume page:

https://github.com/seung-lab/cloud-volume?tab=readme-ov-file#info-files---new-dataset

william-silversmith commented 7 months ago

Just curious, any luck?

jakobtroidl commented 7 months ago

Thanks, Will. I could downsample the data using the command above, but the volume still loads very slowly. Am I missing something else? Maybe some special codecs or compressions? Here are the commands that I used:

python
>> import numpy as np
>> data = np.random.rand(663, 2048, 2048).astype(np.float32)
>> np.save('my-volume.npy', data)
igneous image create my-volume.npy ./my-volume --compress none
gsutil -m cp -r ./my-volume/  gs://bucket/folder/
mkdir queue
igneous image downsample gs://bucket/folder/ --queue ./queue/  --mip 0
igneous execute ./queue
william-silversmith commented 7 months ago

So if it were me, I would do the following. Bear in mind random data compresses very poorly so there's not much you can do on that front without more realistic data.

You can use the --chunk-size flag to make the units served smaller.

igneous image create my-volume.npy ./my-volume --chunk-size 128,128,16

How are you serving the volume? CloudVolume?

jakobtroidl commented 7 months ago

Oh, I am actually using real microscopy data. I used used random data in the comment above for illustration purposes only.

jakobtroidl commented 7 months ago

I am serving the volume in a public gcloud bucket and link to it from neuroglancer.

william-silversmith commented 7 months ago

Ah okay, that's good to know.

Bear in mind that 128x128x64 x 4 bytes = 4MB per a chunk uncompressed. That would easily explain the slow loading. You want those chunks to be more like hundreds of KB to 1 MB.

I would try reducing the chunk size, applying --compress br with --encoding raw or --compress none with --encoding fpzip (fpzip will require the special viewer url above).

Take a look at the file sizes of the chunks and let me know what you see.

jakobtroidl commented 7 months ago

Sorry for the late reply. I recreated the precomputed file using this command as per your suggestion:

igneous image create my-volume.npy my-volume --compress br --encoding raw --chunk-size 64,64,64 --resolution 150,150,200

Next I uploaded it to a gcloud bucket and loaded it (without downsampling) in neuroglancer. Weirdly, in neuroglancer, all volume values are shown as zero (black). Is it possible that br compression cuts off small float values (my volume has many small floats)? See the data distribution of my data here:

hist, bins = np.histogram(data, bins=20) 
print(hist)
print(bins)

>> [143738028 298281703 429206794 455701112 392036276 303565408 230144941
>> 174094705 130469950  94658088  61631109  33830320  16396584   7948335
>> 4249427   2443950   1359753    663994    271557    131518]

>> [0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6  0.65
>> 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]

Individual .br files in the precomputed folder are all around 742.5 KB in size. In my previous attempts, those files were much bigger (e.g., 4MB when I did not use compression).

Note: If I change the above command to this, I can see the values correctly, but it still loads relatively slowly.

igneous image create my-volume.npy my-volume --compress none --chunk-size 64,64,64 --resolution 150,150,200
william-silversmith commented 7 months ago

Oh I suspect I know what happened (just a guess, you might have checked already).

On disk, CloudVolume will save brotli compressed files as .br, but in the cloud they need to have that extension stripped and have the content-encoding: br metadata field set correctly. If you used CloudFiles/Igneous to upload them, that's nbd since it should handle it transparently, but if you used gsutil, manual upload, or something like that you'd run into problems.

jakobtroidl commented 7 months ago

Ok, thanks for the hint. I was able to solve it. It turns out uploading the precomputed file using CloudFiles instead of gcloud did the trick. Here's the code that I used.

from cloudfiles import CloudFiles

cff = CloudFiles('file:///path/to/precomputed')
cfg = CloudFiles('gs://my-bucket/my-folder/')
# Transfer all files from local filesys to google cloud storage
cff.transfer_to(cfg, block_size=64, reencode='br') # change encoding to brotli

Thanks so much for your help @william-silversmith. Igneous is super cool and helpful, I wish I used it earlier already.

william-silversmith commented 7 months ago

So glad it's helpful! Feel free to reach out whenever you have a question.

On Thu, Feb 22, 2024, 11:23 AM Jakob Troidl @.***> wrote:

Ok, thanks for the hint. I was able to solve it. It turns out uploading the precomputed file using CloudFiles instead of cloud did the trick. Here's the code that I used.

from cloudfiles import CloudFiles cff = CloudFiles('file:///path/to/precomputed')cfg = CloudFiles('gs://my-bucket/my-folder/')# Transfer all files from local filesys to google cloud storagecff.transfer_to(cfg, block_size=64, reencode='br') # change encoding to brotli

Thanks so much for your help @william-silversmith https://github.com/william-silversmith. Igneous is super cool and helpful, I wish I used it earlier already.

— Reply to this email directly, view it on GitHub https://github.com/seung-lab/igneous/issues/169#issuecomment-1959802295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATGQSLETYXGV6V2RROHQ5LYU5WGVAVCNFSM6AAAAABC6WOBPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHAYDEMRZGU . You are receiving this because you were mentioned.Message ID: @.***>