seung-lab / cloud-volume

Read and write Neuroglancer datasets programmatically.
https://twitter.com/thundercloudvol
BSD 3-Clause "New" or "Revised" License
130 stars 45 forks source link

Large files issue #505

Open manoaman opened 2 years ago

manoaman commented 2 years ago

Hello,

I tried running CloudVolume on a TIFF stack which is 245GB in size. (155MB each for about 1600+ slices.) I realized the number of chunk files created in a directory have hit 1000001 and that seems like an upper bound from what I can create. (Probably this value is configurable but I'm not sure. Any thoughts?) The following is the error I see while running CloudVolume.

I/O error(31): Too many links
[Errno 31] Too many links:

Should the tiff files be downsized before running CloudVolume? If you could advise me on the approaches, it would be nice to know.

Thank you! -m

fcollman commented 2 years ago

what chunk size did you use? this will control how many files you create. Also a code snippet would help debug what was going on in your info file definition in particular.

manoaman commented 2 years ago

Interesting, maybe I should try 1024 instead? I used chunk_size=[256, 256, 1] for chunking. The following is the code snippet for the info file definition.

info = CloudVolume.create_new_info(  # 'image' or 'segmentation'
                                     # can pick any popular uint
                                     # other options: 'jpeg', 'compressed_segmentation' (req. uint32 or uint64)
                                     # X,Y,Z values in nanometers
                                     # values X,Y,Z values in voxels
                                     # rechunk of image X,Y,Z in voxels
                                     # X,Y,Z size in voxels
    num_channels=1,
    layer_type='image',
    data_type='uint16',
    encoding='raw',
//    resolution=[4000, 4000, 4000],
    resolution=[1850, 1850, 4000],
    voxel_offset=[0, 0, 0],
    chunk_size=[256, 256, 1],
    volume_size=[7370, 8768, 1621]
    )
fcollman commented 2 years ago

(7370/256)(8768/256)(1621/1) = 1,598,347 chunks.

do you want single z sections to be the chunks?

Right now you have ~1MB chunk files (25625616 bits)/(1024*1024 bits/MB)

You could probably get away with 5 MB chunks.

128x128x16 would be ~400K chunks.

On Wed, Nov 3, 2021 at 6:00 PM manoaman @.***> wrote:

Interesting, maybe I should try 1024 instead? I used chunk_size=[256, 256, 1] for chunking. The following is the code snippet for the info file definition.

info = CloudVolume.create_new_info( # 'image' or 'segmentation'

can pick any popular uint

                                 # other options: 'jpeg', 'compressed_segmentation' (req. uint32 or uint64)
                                 # X,Y,Z values in nanometers
                                 # values X,Y,Z values in voxels
                                 # rechunk of image X,Y,Z in voxels
                                 # X,Y,Z size in voxels
num_channels=1,
layer_type='image',
data_type='uint16',
encoding='raw',
resolution=[4000, 4000, 4000],
voxel_offset=[0, 0, 0],
chunk_size=[256, 256, 1],
volume_size=[7370, 8768, 1621]
)

โ€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/seung-lab/cloud-volume/issues/505#issuecomment-960349607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7ABL3LAOQTTMPYDAQGP3UKHSM3ANCNFSM5HKIBQVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

william-silversmith commented 2 years ago

I overall think Forrest has a good suggestion, but I think bits and bytes might be mixed up.

512x512x1 chunks would give you a 4x reduction in the number of files and would be 512x512x1 x 2 bytes = 512 kiB each without compression. Chunking in Z can give better performance while scrolling though the initial upload is a little more complex to do.

If you think you'd like to compact the number of files even further (it wasn't clear to me whether the file quote was per folder or for your whole account), after you upload you can use igneous to transfer to the sharded format using igneous xfer SRC DEST --sharded --queue QUEUE and then delete the original upload once you're satisfied.

manoaman commented 2 years ago

thank you @fcollman !!

@william-silversmith

Initially with 256x256 chunk size, CloudVolume exited so it was quoted against a single folder. (I think the failure has to do with the storage's upper bound on how many files allowed to create.)

If you think you'd like to compact the number of files even further (it wasn't clear to me whether the file quote was per folder or for your whole account), after you upload you can use igneous to transfer to the sharded format using igneous xfer SRC DEST --sharded --queue QUEUE and then delete the original upload once you're satisfied.

I intend to do the chunking in Z with Igneous. Can you tell me a little bet more on how igneous xfer SRC DEST --sharded --queue QUEUE works? What is sharded format and could this be run after chunking in Z? What do I specify in QUEUE ? (Sorry Will for asking so many questions here..)

Thank you all!!

william-silversmith commented 2 years ago

Glad we were able to help!

The sharded format is a method for storing many chunks into a single file while still retaining random access to individual chunks. There's a slight performance penalty, but CloudVolume can read them just like the regular chunked format. You won't be able to write the sharded format easily without specialized knowledge except through Igneous (so no patching missing tiles).

As an example of how to use igneous to generate the sharded version. The QUEUE variable is either an AWS sqs:// queue or a file folder that will be populated with queue files. You can read more here.

igneous xfer ./source-dir ./dest-dir --sharded --queue ./queue --chunk-size 128,128,16
igneous execute ./queue #  you can run as many of these in parallel as you want

Make sure you have the latest igneous version as there was a bug fix in the last update. I tried to make sure that the shard generation takes a reasonable amount of RAM by sizing the files appropriately. The default uncompressed target size is 3.5GB each (could use up to 2x that, the generated shard will be smaller due to compression).

You can see more options for the transfer with: igneous xfer --help

One other thing to keep in mind is that downsampling sharded volumes generates only one additional level of heirarchy at a time. This can introduce a small integer truncation error per level. The regular down-sampling method avoids this issue for 5 mips at a time. This is because generating multiple sharded levels at a time would use unreasonable amounts of memory.

You can read more about sharding here: https://github.com/seung-lab/cloud-volume/wiki/Sharding:-Reducing-Load-on-the-Filesystem

manoaman commented 2 years ago

Oh cool, I did not know igneous was available from pip install. I've given a try with igneous xfer ... by adding "precomputed://" for the target directories and getting this error. Am I missing in providing arguments here?

Traceback (most recent call last):
  File "/mypath/.conda/envs/igneous_test2/bin/igneous", line 8, in <module>
    sys.exit(main())
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/igneous_cli/cli.py", line 254, in xfer
    encoding=encoding, memory_target=memory, clean_info=clean_info
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/igneous/task_creation/image.py", line 415, in create_image_shard_transfer_tasks
    src_vol = CloudVolume(src_layer_path, mip=mip)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/cloudvolume.py", line 207, in __new__
    return REGISTERED_PLUGINS[path.format](**kwargs)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/__init__.py", line 37, in create_precomputed
    cloudpath=get_cache_path(cache, cloudpath),
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/datasource/__init__.py", line 88, in get_cache_path
    return get_cache_path_helper(base, cloudpath)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/datasource/__init__.py", line 97, in get_cache_path_helper
    base, path.protocol, basepath, path.layer
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'
william-silversmith commented 2 years ago

Hi m,

The pip install / CLI version of igneous is newer so not everyone has learned about it yet. I'm glad you find it convenient! Can you provide a more complete command? It's a little hard to debug without seeing the path that triggered the error.

manoaman commented 2 years ago

Ops, sorry about that. I've given several tries after attempting the protocol and the format warnings and this is what I'm seeing so far. The cli command is something as the following:

igneous xfer precomputed://file://../../../../../source_dir precomputed://file://../../../../../dest_dir --sharded --queue precomputed://file://../../../../../queue --chunk-size 128,128,16

source_dir contains the chunked files in Z.

  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/paths.py", line 133, in extract
    fmt, protocol, cloudpath = extract_format_protocol(cloudpath)
  File "/mypath/.conda/envs/igneous_test2/lib/python3.7/site-packages/cloudvolume/paths.py", line 57, in extract_format_protocol
    raise error # e.g. ://test_bucket, test_bucket, wow//test_bucket
cloudvolume.exceptions.UnsupportedProtocolError: 

Trying to follow this rule.

Cloud Path must conform to FORMAT://PROTOCOL://BUCKET/PATH
Examples: 
  precomputed://gs://test_bucket/em
  gs://test_bucket/em
  graphene://https://example.com/image/em

Supported Formats: None (precomputed), graphene, precomputed, boss
Supported Protocols: gs, file, s3, matrix, http, https
william-silversmith commented 2 years ago

You can write simply:

igneous xfer ../../../../../source_dir ../../../../../dest_dir --sharded --queue ../../../../../queue --chunk-size 128,128,16

precomputed:// is probably fine for the source and dest, but the queue is a totally different mechanism and the prefix makes no sense there. The appropriate prefixes for queue are sqs:// (Amazon SQS) and fq:// (File Queue, default).

manoaman commented 2 years ago

Okay. I tried different combinations of FORMAT and PROTOCAL prefixes, and also without the prefixes. It turns out, I had to explicitly specify them. The following command seemed to run okay.

igneous xfer precomputed://file://../../../../../source_dir precomputed://file://../../../../../dest_dir --sharded --queue fq://../../../../../queue --chunk-size 128,128,16

Waiting on the igneous execute to finish and so far the info logs are indicating success and I'm hopeful it will finish. I'll let you know once I verify on Neuroglancer.๐Ÿ˜


INFO Deleting 20e51e44-a0ff-4c62-9ebb-839d5955e946
INFO FunctionTask 20e51e44-a0ff-4c62-9ebb-839d5955e946 succesfully executed in 87.30 sec.
...
``
william-silversmith commented 2 years ago

That is fantastic! Just FYI, you can monitor queue progress with the command ptq status ../../../../../queue ptq also has some other commands to help you manage the queue. Check ptq --help.

ptq = Python Task Queue

manoaman commented 2 years ago

The process still seems to be running and here is what I see from ptq status .... I'll try and give some time to check back later. Looks completed from the status?

Inserted: 140
Enqueued: 0 (0.0% left)
Completed: 140 (100.0%)
Leased: 0 (--%) of queue
william-silversmith commented 2 years ago

It's done! It doesn't automatically exit.

On Sat, Nov 6, 2021, 9:15 PM manoaman @.***> wrote:

The process still seems to be running and here is what I see from ptq status .... I'll try and give some time to check back later. Looks completed from the status?

Inserted: 140 Enqueued: 0 (0.0% left) Completed: 140 (100.0%) Leased: 0 (--%) of queue

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/seung-lab/cloud-volume/issues/505#issuecomment-962534301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATGQSJANLW6W767PWDAJSDUKXONPANCNFSM5HKIBQVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

manoaman commented 2 years ago

Hi Will,

I tested out this morning on Neuroglancer and the sharded file formats load great. I do see the size reduction in number of files generated and the total size the folder takes up on the storage. (144GB to 106G, 413193 files to 141 files) This is nice.

a) Sharded files: 32MB ~ 1.1GB shard files (.sd)

$ ls -l ../sharded_chunks/ | wc -l
141

Total size:
$ du -sh ../sharded_chunks
106G
-----------
b) Before sharded: 4.7KB ~ 442KB precomputed files (.gz)

$ ../precomputed/ | wc -l
413193

$ du -sh ../precomputed
144G

One thing I realized is that the chunks (loading tiles) on Neuroglancer seem to gotten much smaller. Is this because I specified --chunk-size 128,128,16 instead of --chunk-size 512,512,16? When I ran the transfer task in a separate python code, (this is before I ran igneous xfer and how I created the source to generate shard format), and load the precomputed files, I see bigger tiles loading. Could this be something I can tweak with --chunk-size?

This is the python script where I generate the precomputed chunks on z.

 with LocalTaskQueue(parallel=8) as tq:
        tasks = tc.create_transfer_tasks(
          src_layer_path, dest_layer_path,
          chunk_size=(64,64,64), shape=Vec(512,512,512),
          fill_missing=False, translate=(0,0,0),
          bounds=None, mip=0, preserve_chunk_size=True,
          encoding=None, skip_downsamples=False,
          delete_black_uploads=False
        )

And after from igneous cli.

igneous xfer precomputed://file://../../../../../source_dir precomputed://file://../../../../../dest_dir --sharded --queue precomputed://file://../../../../../queue --chunk-size 128,128,16
william-silversmith commented 2 years ago

Hi m,

Yep, the chunk size parameter is what is controlling the size of the tiles. Unfortunately, you'll need to generate a new shard layer. The existing one can't be modified in place. You can do this either from the original tiles or from the existing shards as a source.

manoaman commented 2 years ago

Okay. Let me try increasing the chunk size to 512,512,64 to see the change. It seems to be taking longer this time so I would have to see how the results come out. ๐Ÿ˜

This is a bet off topic. Before getting the image stack to CloudVolume/Igneous, I had tough time splitting multi-page tiff about 245GB in size. It took me about 900GB~1TB memory to allocate on the high performance computing node, and use ImageMagick for splitting. (Anything smaller in memory size resulted in "out of memory" errors.) Is this typical with handling large files before even getting to the chunking stage?

william-silversmith commented 2 years ago

Those are pretty big chunks (33 MB) so your neuroglancer loading may become pretty slow. Might I suggest something closer to 256 x 256 x 16 (2 MB), 256 x 256 x 32 (4 MB), or 512 x 512 x 16 (8 MB)?

I'll admit I haven't worked with very large single TIFF files myself, usually the files are split into single image slices. However, if you find a good TIFF library or use the right features from it, it should be very possible to work with it slice by slice instead of reading the whole thing into memory at once.

You might have some luck perusing the documentation for the tifffile python package. If the images are not compressed, it seems you can read them as a memory mapped file: https://github.com/cgohlke/tifffile/issues/52

If the package doesn't have what you need, it also links in the documentation to a number of other scientific TIFF packages that might have what you want..

manoaman commented 2 years ago

Agh, thank you for reminding me about that. I should have considered the size of the chunks... From your experience, do you suggest each chunk to be somewhere 2MB ~ 8MB in size? (In fact, I was getting an error with 512,512,64 so I ended up using 512,512,16 instead.)

It does seem like reading slice by slice would be a better approach to tackle large files. More z-stacks would give more challenges for sure. ๐Ÿ˜ฑ Thank you again for suggesting the tifffile package!

william-silversmith commented 2 years ago

I think it depends a lot on the expected storage technology and internet connection. I think somewhere around 500 KiB to 2 MiB is a good range if you have gigabit. To cover an XY plane, you'll need to download at least dozens of chunks, so fully consuming your bandwidth with a few chunks isn't ideal. You can go higher, it's just the latency will become more noticeable. It's also important not to go too thick as that will really increase latency somewhat uselessly. If you push that too far, Neuroglancer will limit the number of chunks downloaded because too much memory will be used by non-visible depth.

Everything is chunking from the bottom of computer architecture to high minded stuff like petascale volumes. ๐Ÿ˜ Hope the package is helpful!

manoaman commented 2 years ago

Hi Will,

Sorry to bother you again with the continuous questions. If the shard format is going to be used in the first place, would it still matter what chunk sizes I specify in the pre-chunking stages with CloudVolume and Igneous (create_transfer_tasks)? Will igneous xfer --chunk-size override the predefined chunks and this should be considered as the final result? Still trying to understand the translation of chunks defined in each step.

Thanks! -m

william-silversmith commented 2 years ago

The final transfer command that creates the shards will also use whatever chunk size you specify. The previous chunk sizes are irrelevant, so you should pick them to be convenient for the initial uploading.

manoaman commented 2 years ago

Hi @william-silversmith ,

It's been awhile but I should have asked this question to begin with. What is the largest file size a 3d volumetric image can take before processing in CloudVolume for practical Neuroglancer viewing? What I mean by "practical" here is, chunks are fully loaded in the browser without hitting the RAM limit. (No black tiles in the display.)

In this example, 3d volumetric image (TIFF) was 245GB in size. Is that too large to begin with?

william-silversmith commented 2 years ago

I see Jeremy answered your question in the linked discussion and I agree with him. Make sure to downsample your volume after uploading the initial set of tiles (pick a chunk size like 128x128x64). If you are still having problems visualizing the data, run downsampling again using the top mip level that was generated in the last step. This will build an even taller image pyramid. Once the pyramid is sufficiently tall, you will have no problems at all.

manoaman commented 2 years ago

Hi @william-silversmith , what do you mean by run downsampling again in CloudVolume/Igneous terms? Are you referring to DownSample tasks? https://github.com/seung-lab/igneous#downsampling-downsampletask.

Will DownSample task work after rechunking (https://github.com/seung-lab/igneous#data-transfer--rechunking-transfertask)? So in the actual code, would it be...something like this?

with LocalTaskQueue(parallel=8) as tq:
  tasks = tc.create_transfer_tasks(
    src_layer_path, dest_layer_path, 
    chunk_size=(64,64,64), shape=Vec(512,512,512),
    fill_missing=False, translate=(0,0,0), 
    bounds=None, mip=0, preserve_chunk_size=True,
    encoding=None, skip_downsamples=False,
    delete_black_uploads=False
  )  
  tq.insert_all(tasks)

print("Done!")

# downsample from here ???
tasks = create_downsampling_tasks(
    layer_path, # e.g. 'gs://bucket/dataset/layer'
    mip=0, # Start downsampling from this mip level (writes to next level up)
    fill_missing=False, # Ignore missing chunks and fill them with black
    axis='z', 
    num_mips=5, # number of downsamples to produce. Downloaded shape is chunk_size * 2^num_mip
    chunk_size=None, # manually set chunk size of next scales, overrides preserve_chunk_size
    preserve_chunk_size=True, # use existing chunk size, don't halve to get more downsamples
    sparse=False, # for sparse segmentation, allow inflation of pixels against background
    bounds=None, # mip 0 bounding box to downsample 
    encoding=None # e.g. 'raw', 'compressed_segmentation', etc
    delete_black_uploads=False, # issue a delete instead of uploading files containing all background
    background_color=0, # Designates the background color
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
    factor=(2,2,1), # common options are (2,2,1) and (2,2,2)
  )
  tq.insert_all(tasks)

If you are still having problems visualizing the data, run downsampling again using the top mip level that was generated in the last step. This will build an even taller image pyramid.

Once the pyramid is sufficiently tall, you will have no problems at all. How do you tell if the pyramid is sufficiently tall?

Thank you Will, -m

william-silversmith commented 2 years ago

Hi m,

I think you will find it easier to use the Igneous command line interface if you can. The transfer tasks will automatically create a few levels of downsamples, so if you run downsampling from mip 0 again, you probably won't see much improvement. You'll probably also enjoy using FileQueue more as you can stop and restart jobs without starting again from the beginning.

With XY dimension chunk size 128 and a task size of 1024, you should expect three downsamples to be generated.

Try this:

igneous image xfer SRC DEST --mip 0 --chunk-size 128,128,64 --shape 1024,1024,64 --queue ./queue
igneous -p 8 execute -x ./queue
igneous image downsample SRC --mip 3 --num-mips 4 --queue ./queue
igneous -p 8 execute -x ./queue
manoaman commented 2 years ago

Hi @william-silversmith ,

Okay, I've tried testing with 3D image volume (7332 x 10131 x 3900; tiff stacks, 329GB in total size) and I don't know if I succeeded at a downsampling stage. The files sizes don't seem to change in the destination folder before and after the downsample. I see an error running a CLI command so I could be designing the chunk or shape sizes incorrectly... Would you be able to advise what am I doing wrong?

Here are the steps I took.


  1. Run CloudVolume to chunk XY dimension. (I chose 1024,1024,1 so that I won't face I/O error on creating too many files. It seems that 1,000,000 files is the storage limit. Maybe configurable on the storage to increase this limit to allow smaller chunks?)

Configured parameters in a CloudVolume script:

    chunk_size=[1024, 1024, 1],    
    volume_size=[7332, 10131, 3900],

Output files/folders in the destination folder:

$ ls

1800_1800_2000  info  progress  provenance

  1. Next, rechunked on XYZ with Igneous CLI.

CLI:

$ igneous image xfer SRC DEST  --mip 0 --chunk-size 128,128,64 --shape 1024,1024,64 --queue ./queue
$ igneous -p 36 execute -x ./queue

Output files/folders in the destination folder:

$ du -sh ./

35M     ./14400_14400_2000
2.9G    ./1800_1800_2000
569M    ./3600_3600_2000
138M    ./7200_7200_2000
24K     ./info
24K     ./provenance

  1. Lastly, the failing step. Downsample.

CLI:

$ igneous image downsample SRC --mip 3 --num-mips 4 --queue ./queue
$ igneous -p 36 execute -x ./queue

Output files/folders in the destination folder:

$ du -sh ./

35M     ./14400_14400_2000
2.9G    ./1800_1800_2000
569M    ./3600_3600_2000
138M    ./7200_7200_2000
24K     ./info
24K     ./provenance

Error message:

Quite a few cloudvolume.exceptions.EmptyVolumeException printed on the terminal due to missing chunks.

ERROR FunctionTask(('igneous.tasks.image.image', 'DownsampleTask'),[],{'layer_path': 'file:///folder_name', 'mip': 3, 'shape': [2048, 2048, 64], 'offset': [0, 0, 1088], 'axis': 'z', 'fill_missing': False, 'sparse': False, 'delete_black_uploads': False, 'background_color': 0, 'dest_path': None, 'compress': None, 'factor': [2, 2, 1]},"327db0a1-27d1-4394-a20e-05c4cb7a2cea") raised 14400_14400_2000/0-128_256-384_1088-1152
 Traceback (most recent call last):
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/taskqueue.py", line 375, in poll
    task.execute(*execute_args, **execute_kwargs)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 78, in execute
    self(*args, **kwargs)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 87, in __call__
    return self.tofunc()()
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 467, in DownsampleTask
    factor=factor,
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 426, in TransferTask
    src_bbox, agglomerate=agglomerate, timestamp=timestamp
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py", line 709, in download
    bbox.astype(np.int64), mip, parallel=parallel, renumber=bool(renumber)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/__init__.py", line 183, in download
    background_color=int(self.background_color),
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 281, in download
    green=green, secrets=secrets, background_color=background_color
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 560, in download_chunks_threaded
    green=green,
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 104, in schedule_jobs
    return schedule_threaded_jobs(fns, concurrency, progress, total)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 30, in schedule_threaded_jobs
    tq.put(updatefn(fn))
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 257, in __exit__
    self.wait(progress=self.with_progress)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 227, in wait
    self._check_errors()
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 191, in _check_errors
    raise err
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 153, in _consume_queue
    self._consume_queue_execution(fn)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 180, in _consume_queue_execution
    fn()
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 23, in realupdatefn
    res = fn()
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 528, in process
    decode_fn, decompress
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 509, in download_chunk
    background_color=background_color)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 582, in decode
    mip, background_color,
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 629, in _decode_helper
    raise EmptyVolumeException(input_bbox)
cloudvolume.exceptions.EmptyVolumeException: 14400_14400_2000/0-128_256-384_1088-1152
william-silversmith commented 2 years ago

Hi m,

The empty volume error appears if your image does not completely fill the space or if you're pointed at an incorrect location.

I noticed you reset the chunk size on the command line after you set it in the info file which may not be what you want.

You can try using the --fill-missing flag which will write zeroed data instead of throwing an exception.

For the transfer step, you can also try using --sharded to reduce the number of files written dramatically though no downsamples will generate from that step (so it all will need to be done via the downsample command).

On Thu, Jun 30, 2022, 5:10 PM manoaman @.***> wrote:

Hi Will,

Okay, I've tried testing with 3D image volume (7332 x 10131 x 3900; tiff stacks, 329GB in total size) and I don't know if I succeeded at a downsampling stage. The files sizes don't seem to change in the destination folder before and after the downsample. I see an error running a CLI command so I could be designing the chunk or shape sizes incorrectly... Would you be able to advise what am I doing wrong?

Here are the steps I took.

  1. Run CloudVolume to chunk XY dimension. (I chose 1024,1024,1 so that I won't face I/O error on creating too many files. It seems that 1,000,000 files is the storage limit. Maybe configurable on the storage to increase this limit to allow smaller chunks?)

Configured parameters in a CloudVolume script:

chunk_size=[1024, 1024, 1],
volume_size=[7332, 10131, 3900],

Output files/folders in the destination folder:

$ ls

1800_1800_2000 info progress provenance


  1. Next, rechunked on XYZ with Igneous CLI.

CLI:

$ igneous image xfer SRC DEST --mip 0 --chunk-size 128,128,64 --shape 1024,1024,64 --queue ./queue $ igneous -p 36 execute -x ./queue

Output files/folders in the destination folder:

$ du -sh ./

35M ./14400_14400_2000 2.9G ./1800_1800_2000 569M ./3600_3600_2000 138M ./7200_7200_2000 24K ./info 24K ./provenance


  1. Lastly, the failing step. Downsample.

CLI:

$ igneous image downsample SRC --mip 3 --num-mips 4 --queue ./queue $ igneous -p 36 execute -x ./queue

Output files/folders in the destination folder:

$ du -sh ./

35M ./14400_14400_2000 2.9G ./1800_1800_2000 569M ./3600_3600_2000 138M ./7200_7200_2000 24K ./info 24K ./provenance

Error message:

Quite a few cloudvolume.exceptions.EmptyVolumeException printed on the terminal due to missing chunks.

ERROR FunctionTask(('igneous.tasks.image.image', 'DownsampleTask'),[],{'layer_path': 'file:///folder_name', 'mip': 3, 'shape': [2048, 2048, 64], 'offset': [0, 0, 1088], 'axis': 'z', 'fill_missing': False, 'sparse': False, 'delete_black_uploads': False, 'background_color': 0, 'dest_path': None, 'compress': None, 'factor': [2, 2, 1]},"327db0a1-27d1-4394-a20e-05c4cb7a2cea") raised 14400_14400_2000/0-128_256-384_1088-1152 Traceback (most recent call last): File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/taskqueue.py", line 375, in poll task.execute(*execute_args, *execute_kwargs) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 78, in execute self(args, **kwargs) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 87, in call return self.tofunc()() File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 467, in DownsampleTask factor=factor, File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 426, in TransferTask src_bbox, agglomerate=agglomerate, timestamp=timestamp File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py", line 709, in download bbox.astype(np.int64), mip, parallel=parallel, renumber=bool(renumber) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/init.py", line 183, in download background_color=int(self.background_color), File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 281, in download green=green, secrets=secrets, background_color=background_color File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 560, in download_chunks_threaded green=green, File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 104, in schedule_jobs return schedule_threaded_jobs(fns, concurrency, progress, total) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 30, in schedule_threaded_jobs tq.put(updatefn(fn)) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 257, in exit self.wait(progress=self.with_progress) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 227, in wait self._check_errors() File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 191, in _check_errors raise err File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 153, in _consume_queue self._consume_queue_execution(fn) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 180, in _consume_queue_execution fn() File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 23, in realupdatefn res = fn() File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 528, in process decode_fn, decompress File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 509, in download_chunk background_color=background_color) File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 582, in decode mip, background_color, File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 629, in _decode_helper raise EmptyVolumeException(input_bbox) cloudvolume.exceptions.EmptyVolumeException: 14400_14400_2000/0-128_256-384_1088-1152

โ€” Reply to this email directly, view it on GitHub https://github.com/seung-lab/cloud-volume/issues/505#issuecomment-1171682741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATGQSMOL3MD6FNZB6LNEW3VRYEMVANCNFSM5HKIBQVQ . You are receiving this because you were mentioned.Message ID: @.***>

manoaman commented 2 years ago

Thanks for the feedbacks @william-silversmith

--fill-missing option did work. Thank you.

I didn't quite understand what you mean by "reset the chunk size". What chunk size should I be using in the first place? I think I'm a little confused how to define chunk sizes transitioning from a CloudVolume script to Igneous CLI and the use of --shape option.

I noticed you reset the chunk size on the command line after you set it in the info file which may not be what you want.

The results so far I see in the viewer are partially loaded chunks and stops loading so I'm not sure if I succeeded in downsampling. I went as deep as --num-mips 8 and here are the generated files.

du -sh ./*

29M     ./14400_14400_2000
2.0G    ./1800_1800_2000
19M     ./28800_28800_2000
487M    ./3600_3600_2000
6.1M    ./57600_57600_2000
117M    ./7200_7200_2000
24K     ./info
24K     ./provenance

info file info.txt

Any thoughts?

manoaman commented 2 years ago

Hi @william-silversmith ,

Does CloudVolume have a way to get around with generating millions of chunks in a folder?

I want to try and accomplish the chunk sizes in either 128x128x64 or 256x256x32. However, in the very first step with CloudVolume (before shard format), it hits the limit (1~2 million per folder) on the network storage that I'm using.

(7370/128)x(10131/128)x(3900/1) = 17,773,152 chunks. (7370/256)x(10131/256)x(3900/1) = 4,443,288 chunks.

Can I chunk in 512x512x1 with CloudVolume, and then use Igenous to use transfer task to rechunk in 256x256x32? Similar to the previous questions, I wasn't sure how this will break the info file.

How do I get around this issue?

william-silversmith commented 2 years ago

Can I chunk in 512x512x1 with CloudVolume, and then use Igenous to use transfer task to rechunk in 256x256x32? Similar to the previous questions, I wasn't sure how this will break the info file.

Yes, this is possible. Just make sure you set up the initial upload such that it aligns to the 512x512x1 boundaries. I would recommend 1024x1024x1 just to make it even fewer if your network speed supports it.

When you rechunk it with Igneous, you can also set it to --sharded which will drastically reduce the number of files.

manoaman commented 2 years ago

Hi @william-silversmith ,

I'm trying to better understand various options of the CLI commands and I have two questions. Maybe I'm asking stupid questions but I'm hoping you can clarify. Thank you.

  1. Why are the total number of chunks from all levels differ while same number of levels (folders) are generated from different shapes?

Generated folders: 14400_14400_2000 1800_1800_2000 28800_28800_2000 3600_3600_2000 7200_7200_2000 info provenance

--shape 4096,4096,256 (4 levels)

$ igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_4lv  --mip 0 --chunk-size 256,256,16 --shape 4096,4096,256 --fill-missing --queue ./queue

126,052 chunks = 93472 + 24112 + 6400 + 1600 + 468

--shape 8192,8192,512 (5 levels)

igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_5lv  --mip 0 --chunk-size 256,256,16 --shape 8192,8192,512 --fill-missing --queue ./queue

382,104 chunks = 283040 + 73200 + 19520 + 4880 + 1464

  1. Why are only 5 levels generated when 7 levels are expected? I don't see 57600_57600_2000 and 115200_115200_2000. Is level 5 the highest for the image I'm processing?
$ igneous image downsample --mip 4 --num-mips 7 --fill-missing --queue ./queue file:///nfs/precomputed/1024x1024x16_256x256x16_5lv
william-silversmith commented 2 years ago

Hi m,

For the first question, please check if the chunk size in the info file is the same. There shouldn't be different quantities of files if only the shape changes. Tangentially, since you're using 2x2x1 downsampling, there's no need to use large z dimensions for the shape. You can use 4096x4096x16 or 8192x8192x8. However, you can also simply not set the shape and set a memory target --memory BYTES and the task shape will be set to a size that makes sense for that limit.

Why are only 5 levels generated when 7 levels are expected? I don't see 57600_57600_2000 and 115200_115200_2000. Is level 5 the highest for the image I'm processing?

I would need to see the info file, but likely the volume size became smaller than the chunk size and so the upper mips were not generated.

manoaman commented 2 years ago

Hi @william-silversmith ,

I tried again using 4096x4096x16 or 8192x8192x8 as you mentioned. info files all appeared to be the same. Attached in this post. And the info files before this retry appeared to be the same content as well. However, the quantities of the chunks are still different between these two shapes. 4096x4096x16 (15,042 chunks) and 8192x8192x8 (66,703 chunks).

The downsample stage still did not generate two more levels so the folders remain in five levels. (I tried running igneous image downsample twice.)

Why are only 5 levels generated when 7 levels are expected? I don't see 57600_57600_2000 and 115200_115200_2000. Is level 5 the highest for the image I'm processing?

I'll try and see if I see any difference using --memory BYTES next. The default size is 3500000000.0. What is the largest size I can choose for this parameter?

Thank you Will, -m

--

  1. 4096x4096x16 shape
$ igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16  --mip 0 --chunk-size 256,256,16 --shape 4096,4096,16 --fill-missing --queue ./queue

$ igneous -p 36 execute -x ./queue

$ igneous image downsample --mip 4 --num-mips 7 --fill-missing --queue ./queue file:///nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16

$ igneous -p 36 execute -x ./queue
$ ls /nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16/28800_28800_2000/ | wc -l
62
$ ls /nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16/14400_14400_2000/ | wc -l
193
$ ls /nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16/7200_7200_2000/ | wc -l
769
$ ls /nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16/3600_3600_2000/ | wc -l
2881
$ ls /nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16/1800_1800_2000/ | wc -l
11137

15,042 chunks = 11137 + 2881 + 769 + 193 + 62

--

  1. 8192x8192x8 shape
$ igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16 --mip 0 --chunk-size 256,256,16 --shape 8192,8192,16 --fill-missing --queue ./queue

$ igneous -p 36 execute -x ./queue

$ igneous image downsample --mip 4 --num-mips 7 --fill-missing --queue ./queue file:///nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16

$ igneous -p 36 execute -x ./queue
$ ls /nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/28800_28800_2000/ | wc -l
243
$ ls /nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/14400_14400_2000/ | wc -l
853
$ ls /nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/7200_7200_2000/ | wc -l
3409
$ ls /nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/3600_3600_2000/ | wc -l
12781
$ ls /nfs/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/1800_1800_2000/ | wc -l
49417

66,703 chunks = 49417 + 12781 + 3409 + 853 + 243

info.txt

manoaman commented 2 years ago

Tried --memory 3500000000 and seems to be generating same 5 levels after the downsample. Total chunks are slightly different. info file appears to be the same as the one I attached.

$ igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_memory3500000000 --mip 0 --chunk-size 256,256,16 --memory 3500000000 --fill-missing --queue ./queue

$ igneous -p 36 execute -x ./queue

$ igneous image downsample --mip 3 --num-mips 7 --fill-missing --queue ./queue file:///nfs/precomputed/256x256x16_memory3500000000

$ igneous -p 36 execute -x ./queue

18,098 chunks = 13376 + 3488 + 944 + 236 + 54

One random question, do I need to remove ./queue folder every time igneous executes?

william-silversmith commented 2 years ago

The downsample is terminating after 5 mip levels because the XY dimension would be smaller than 1 chunk in one additional downsample.

If the chunk sizes are the same for 4096 vs 8192, are the contents of the directories different? Are the sizes of the folders the same in bytes? Have you tried visualizing the directories? Do they look okay? igneous view DIRECTORY

One random question, do I need to remove ./queue folder every time igneous executes?

Nope. If the queue is empty, feel free to reuse it. You can manage the queue with the ptq command.

Tried --memory 3500000000 and seems to be generating same 5 levels after the downsample.

The --memory command won't do anything that different, it just allows you to easily size tasks for the RAM you want each process to use. You're using very large tasks in a large number of processes. I was somewhat surprised your machine could handle that. 3500000000 = 3.5 GB per task.

manoaman commented 2 years ago

No, unfortunately both directories do not look okay in the viewer. Both viewing the two directories of 4096 vs 8192 on Neuroglancer won't show the complete chunks in 3d cross-sectional view. x-z and y-z views are both incomplete but I observe 8192 directory to display somewhat more chunks. I did not attach but x-y view displays clearly for only the available stripes (chunks) in x-z and y-z views as you see from the screenshot. I still get bothered by why the chunk statistics show quite a few F (failed) chunks in the statics. (2nd screenshot)

8192 directory viewing

Screen Shot 2022-07-15 at 2 27 43 PM

chunks statistics

Screen Shot 2022-07-15 at 2 29 04 PM

If the chunk sizes are the same for 4096 vs 8192, are the contents of the directories different? Are the sizes of the folders the same in bytes? Have you tried visualizing the directories? Do they look okay? igneous view DIRECTORY

Sizes are quite different and 8192 directory obviously has more chunks generated.

16GB in total folder size:

$ du -sh /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/*
45M /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/28800_28800_2000
174M    /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/14400_14400_2000
719M    /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/7200_7200_2000
3.0G    /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/3600_3600_2000
12G /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/1800_1800_2000
24K /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/info
24K /nfs/precomputed/_1024x1024x16_256x256x16_4lv_s4096x4096x16/provenance

69GB in total folder size:

$ du -sh /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/*
192M    /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/28800_28800_2000
766M    /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/14400_14400_2000
3.1G    /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/7200_7200_2000
13G /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/3600_3600_2000
52G /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/1800_1800_2000
24K /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/info
24K /nfs/precomputed/_1024x1024x16_256x256x16_5lv_s8192x8192x16/provenance

I tried diff -rq on both directories and I see quite a lot of discrepancies. Most of the chunks are missing in 4096, and there are also missing chunks in 8192 directory as well. diff_dirs.txt

william-silversmith commented 2 years ago

I think what's happening is your execution is failing. For some reason, it's failing faster in the case of the 4096. Are you sure the queue was fully emptied? ptq status ./queue

Can you let me know some parameters of the machine you are running this on? How much RAM, disk space, and number of cores?

manoaman commented 2 years ago

Okay, this is the part I probably need to better understand. From the previous runs, I only typed igneous -p 36 execute -x ./queue command once from the terminal. After the following logs on stdout and terminates, I have moved on to downsample cli.

INFO Deleting 1249b0dd-2482-4021-a13f-59ac90c5bc7a
INFO FunctionTask 1249b0dd-2482-4021-a13f-59ac90c5bc7a succesfully executed in 177.46 sec.
INFO Deleting 6a11ef35-6de0-4cfd-9b77-fec5d6203d6f
INFO FunctionTask 6a11ef35-6de0-4cfd-9b77-fec5d6203d6f succesfully executed in 188.66 sec.
bash-4.2$
bash-4.2$ 

However, as you mentioned, when I checked ptq status ./queue, I only see 2.4% of the enqueued tasks are completed. Do I need to manually type igneous -p 36 execute -x ./queue until the enqueued tasks are completed? I don't see any signs of progress after the first run terminates.

$ igneous image xfer file:///nfs/precomputed/NeuN_1024x1024x1 file:///panfs/dong/seita/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16  --mip 0 --chunk-size 256,256,16 --shape 4096,4096,16 --fill-missing --queue ./queue

$ ptq status ./queue
Inserted: 1464
Enqueued: 1464 (100.0% left)
Completed: 0 (0.0%)
Leased: 0 (0.0% of queue)
$ igneous -p 36 execute -x ./queue

$ ptq status ./queue
Inserted: 1464
Enqueued: 1429 (97.6% left)
Completed: 35 (2.4%)
Leased: 0 (0.0% of queue)

I'm using a high performance compute node which is equipped with 36 cores, 1TB RAM (1000GB allocated), and writing to a petabyte network storage.

william-silversmith commented 2 years ago

Since you have 1 TB RAM and 36 cores, I would recommend using --memory 10000000000 (10 GB). Using -p uses python's multiprocessing, which is a little flaky as if one process dies the others may go with it. I would recommend using something more like the below which will create independent processes to be more robust to failure.

screen -S igneous
for i in (1..36); 
do igneous execute -x queue &; 
done; 
wait;
<ctrl-A ctrl-D to detach from the screen session, you can reattach with screen -r igneous> 

You can then monitor the queue with watch ptq status ./queue

If your job fails, the tasks have a timeout so if you simply restart igneous execute nothing will happen if all the tasks were touched and then died. Use ptq release ./queue to reset the timeouts on the tasks.

You can also run igneous execute -x ./queue manually while its processing to follow along and see if something bad is happening.

manoaman commented 2 years ago

Let me give it a try by looping without -p option and how the processes finishes. I'll get back to you. In the meantime, I see the following errors from each process. Could this also related to chunks not being generated?


Traceback (most recent call last):
  File "/homedir/.conda/envs/igneous/bin/igneous", line 8, in <module>
    sys.exit(main())
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous_cli/cli.py", line 545, in execute
    parallel_execute_helper(parallel, args)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous_cli/cli.py", line 549, in parallel_execute_helper
    execute_helper(*args)
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous_cli/cli.py", line 583, in execute_helper
    stop_fn=stop_after_elapsed_time,
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/taskqueue.py", line 400, in poll
    if stop_fn_bound():
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous_cli/cli.py", line 571, in stop_after_elapsed_time
    if exit_on_empty and tq.is_empty():
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/taskqueue.py", line 138, in is_empty
    return self.api.is_empty()
  File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/file_queue_api.py", line 411, in is_empty
    first(iter(self))
NameError: name 'first' is not defined
william-silversmith commented 2 years ago

This looks potentially like a problem with not having updated dependencies. Try doing pip install task-queue cloud-files cloud-volume -U

manoaman commented 2 years ago

Okay, with several attempts to the looping solution, I ended up using -p 36 to take advantage of the available cpu cores and processed pretty fast on queued 1400+ tasks. Previously, it took a day with a single core so I had to use the -p option anyways. So it turned out that I was misusing the TaskQueue to begin with but I'm glad that I learned ptq command better now.

$ for i in {1..14}; do `igneous -p 36 execute -x ./queue &`; done; wait;

If the chunk sizes are the same for 4096 vs 8192, are the contents of the directories different? Are the sizes of the folders the same in bytes? Have you tried visualizing the directories? Do they look okay? igneous view DIRECTORY

The chunked results came out very well for 4096, 8192, and also --memory 10000000000 (10 GB) option instead of specifying the shape size. Which gave same four levels as the 4096 shape. And downsampled to make them five pyramid levels. 8192 shape came out five levels so I suppose I don't need to downsample anymore. As I understand, five pyramid levels are the highest it can get for this specific image.

4096 and --memory 10GB

382,104 (Chunks) = 1464 + 4880 + 19520 + 73200 + 283040

8192

382,109 (Chunks) = 1465 + 4881 + 19521 + 73201 + 283041

The good news is, file sizes and the number of chunks for all these different shape/memory options came out equally. To be accurate, only differences is that with 8192 shape, there are one extra chunks in each level so additional five chunks total for this shape. From viewing on Neuroglancer, there were sluggishness in loading thinner chunks on X-Z, Y-Z views but it visualized much much better overall after increasing Neuroglancer's GPU memory and System memory to 4GB, and concurrent chunk requests to 10,000.

Next, I will give it a try with shard format to see if I can reduce the number of these chunks. And also, if Isotropic chunk (128x128x128) size work better with loading on X-Z, Y-Z views and reducing the numbers in "F" (failed) statistics in Neuroglancer on changing scales (zoom in/out).

Thank you for your patience in guiding me through debugging @william-silversmith !!

manoaman commented 2 years ago

@william-silversmith how do I downsample and shard one level at a time? I've succeeded in the first shard step and generated one pyramid level.

$ igneous image downsample --mip 0 --num-mips 5 --fill-missing --sharded --queue ./queue file:///nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16_sharded

igneous: sharded downsamples only support producing one mip at a time.

For downsampled shards onward (2~5 levels), should I be running each level as follows?

igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16  --mip 1 --chunk-size 256,256,16 --shape 4096,4096,16 --fill-missing --queue ./queue

igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/1024x1024x16_256x256x16_4lv_s4096x4096x16  --mip 2 --chunk-size 256,256,16 --shape 4096,4096,16 --fill-missing --queue ./queue

and onward... ??

Unfortunately, you'll need to generate a new shard layer. The existing one can't be modified in place. You can do this either from the original tiles or from the existing shards as a source.

william-silversmith commented 2 years ago

Simply set --num-mips 1 --mip 1. For reasons of memory intensiveness, only one sharded donwsample can be generated at once. Just keep incrementing --mip each time. However, if it is okay to generate unsharded downsamples (after a certain point the number of chunks will be tolerable), you can omit --sharded and use a higher --num-mips value.

manoaman commented 1 year ago

Hi @william-silversmith , it's been awhile since I posted but I still have doubts processing high resolution images for downsampling after having more chance to look at other Neuroglancer examples.

When I ran Igneous to transfer/downsample with the following commands on the precomputed dataset after running CloudVolume, the generated folders are 1800_1800_2000, 3600_3600_2000, 7200_7200_2000, 14400_14400_2000, 28800_28800_2000 and each folder are in the range of 1.1GB ~ 297GB.

However, what I am expecting to see is much lesser number of chunks and file sizes in respective folders with same given resolution. For example, I am expecting to see 1_1_1, 2_2_2, 4_4_4, 8_8_8, 16_16_16, 32_32_32, 64_64_64 folders generated with less number of chunks.

The original image resolution is 1.8 ยตm x 1.8 ยตm x 2.0 ยตm (xyz) and the file size is 330GB in total (tif stack). I'm also attaching two info files for your reference. One is my version, and the other is the expected version.

my_info_file.txt expected_info_file.txt

Why my version of the info file and generated folders do not come out as 1_1_1, 2_2_2.... ? Resolution is set to [1800,2000,2000] in creating an info file. Should this be set differently?

It would be very helpful if you could point out what I could be missing. Thank you! -m

Used commands Igneous.

% igneous image xfer file:///nfs/test1 file:///nfs/test2 --mip 0 --chunk-size 64,64,64 --shape 2048,2048,2048 --fill-missing --queue ./queue

% igneous image downsample --mip 4 --num-mips 7 --fill-missing --queue ./queue file:///nfs/test2
william-silversmith commented 1 year ago

Hi m,

There's a few things going on here. The folder names for each mip (the "key") are usually set to a representation of the resolution. If you'd like it to be a custom name, you can edit the info file after generating it but before running execute.

A funny thing is that the info file you are getting should have fewer chunks than the one you are specifying. 256x256x16 is 4 times bigger than 643. However, the --chunk-size flag should be setting your mip 0 to 64x64x64. Have you tried deleting the destination info file before running igneous image xfer? What version of igneous are you using? Make sure it's the latest so we're looking at the same thing.

Will

manoaman commented 1 year ago

Hi Will,

The version (my_info_file.txt) I previous posted is right after I ran CloudVolume. I'm attaching the version after running igneous image xfer. It does seem to update info file with 64x64x64 so I believe that part is okay.

A funny thing is that the info file you are getting should have fewer chunks than the one you are specifying. 256x256x16 is 4 times bigger than 64^3 . However, the --chunk-size flag should be setting your mip 0 to 64x64x64. Have you tried deleting the destination info file before running igneous image xfer?

my_info_file_after_xfr.txt

I'm using Igneous version 4.3.0.

There's a few things going on here. The folder names for each mip (the "key") are usually set to a representation of the resolution. If you'd like it to be a custom name, you can edit the info file after generating it but before running execute.

This is the part I'm not sure what exactly is happening. If custom name works, is it possible that images have been pre-processed (downsampled?) before converting into precomputed format with CloudVolume? Resolution is still set to [1800,2000,2000] and the x,y,z navigation bars on the Neuroglancer take full range of the volume size (7332 x 10131 x 3900). Although, the number of chunks loaded are drastically small in numbers. I'd say less than 100 chunks for initial viewing. On the other hand, the same raw image I processed with CloudVolume and Igneous loads up to close to 8500 chunks on one mip level and a few thousands on other mip levels for initial viewing. How is that possible viewing same images offering different number of chunks in each mip level? Could I be missing something?

Thanks, -m

william-silversmith commented 1 year ago

Hi m,

I think I'm having a hard time visualizing what's going on. I think what might be helpful is if you run ls | wc -l and report the number of chunks in each resolution's directory and share a screenshot if something is not displaying correctly.

manoaman commented 1 year ago

Hi Will,

Several things I'm experiencing while using Igneous. Unfortunately, I can't share the screenshots but I'm hoping I can provide good enough information on what I am stuck with.

1) Chunking 1024x1024x1 (CloudVolume) ---> 64x64x64 (Igneous)

Two tasks are inserted. After running igneous execute, only two tasks are in the queue but does not complete. It goes back to enqueued and I had to terminate the program. info file is created with updated chunk size. What am I doing wrong here?

igneous image xfer file:///nfs/precomputed/1024x1024x1 file:///nfs/precomputed/64x64x64 --mip 0 --chunk-size 64,64,64 --shape 8192,8192,8192 --fill-missing --queue ./queue

2) Chunking 1024x1024x1 (CloudVolume) ---> 256x256x16 (Igneous)

Chunking works okay. I see quite a few chunks generated. When loaded on the Neuroglancer, images load nicely while the number of visible chunks are quite large in each mip level. Ranging in several thousand chunks loaded. What I also see is my browser quickly hitting 4GB GPU RAM limit from looking at the activity monitor. I believe Neuroglancer stops loading and I see unloaded tiles in random levels while navigating.

ls -l /panfs/dong/seita/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/28800_28800_2000/ | wc -l
1465
ls -l /panfs/dong/seita/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/14400_14400_2000/ | wc -l
4881
ls -l /panfs/dong/seita/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/7200_7200_2000/ | wc -l
19521
ls -l /panfs/dong/seita/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/3600_3600_2000/ | wc -l
73201
ls -l /panfs/dong/seita/precomputed/1024x1024x16_256x256x16_5lv_s8192x8192x16/1800_1800_2000/ | wc -l
283041

The first problem is not being able to convert into 64x64x64 chunks. The second problem is requested number of chunks are still quite large viewing from the Neuroglancer and GPU RAM usage hits the 4GB limit quite quickly. From the status panel, thousands of chunks are loaded. I'm hoping to reduce the number of requested chunks and make the chunks smaller. Currently investigating to see if shard format can reduce the number of chunks requested from the Neuroglancer and circumvents the memory issue. I can provide more updates once shard process finishes.


In the meantime, I've seen a publicly available example which has higher image resolution than the image I'm using. I'm hoping I can accomplish generating precomputed files and chunks small enough so that Neuroglancer does not request thousands of chunks while reaching gpu ram limit.

As you see in this example, the number of visible chunks are much smaller and less than hundred chunks. Each chunk sizes are also small in size. Folder names of each mip level are maybe custom changed? I was hoping to figure out how to make Neuroglancer lighter in visualizing high resolution images and limiting the number of visible chunks.

https://hemibrain-dot-neuroglancer-demo.appspot.com/#!%7B%22dimensions%22:%7B%22x%22:%5B8e-9%2C%22m%22%5D%2C%22y%22:%5B8e-9%2C%22m%22%5D%2C%22z%22:%5B8e-9%2C%22m%22%5D%7D%2C%22position%22:%5B13726.1123046875%2C22354.53515625%2C18610.5%5D%2C%22crossSectionScale%22:112.6579354393845%2C%22crossSectionDepth%22:-37.62185354999912%2C%22projectionOrientation%22:%5B0.6376019716262817%2C-0.017969461157917976%2C-0.03177061304450035%2C0.769500732421875%5D%2C%22projectionScale%22:64770.91726975332%2C%22layers%22:%5B%7B%22type%22:%22image%22%2C%22source%22:%22precomputed://gs://neuroglancer-janelia-flyem-hemibrain/emdata/clahe_yz/jpeg%22%2C%22tab%22:%22source%22%2C%22name%22:%22emdata%22%7D%5D%2C%22showSlices%22:false%2C%22layout%22:%22xy-3d%22%2C%22statistics%22:%7B%22size%22:478%2C%22visible%22:true%7D%2C%22selection%22:%7B%7D%7D

Screen Shot 2022-10-06 at 10 20 14 AM