SkeletonMergeTask occupies too much memory

JackieZhai commented 2 years ago

Hi, @william-silversmith I have tried a 100^3 um^3 EM stack with Igneous skeletonization. Unfortunately, the memory usage exceeds 350 GB at the second pass of SkeletonMergeTask. This is impossible on our server ...

Our code of skeletonization:

from taskqueue import LocalTaskQueue
import igneous.task_creation as tc

tq = LocalTaskQueue(parallel=16)
tasks = tc.create_skeletonizing_tasks(cloudpath, \
    mip=2, shape=(512, 512, 512), sharded=True, \
    teasar_params={
        'scale': 4, 
        'const': 500,
        'pdrf_exponent': 4,
        'pdrf_scale': 100000,
        'soma_detection_threshold': 1100,
        'soma_acceptance_threshold': 3500,
        'soma_invalidation_scale': 1.0,
        'soma_invalidation_const': 300,
        'max_paths': None
    }, dust_threshold=1000)
tq.insert(tasks)
tq.execute()

tq = LocalTaskQueue(parallel=1)
tasks = tc.create_sharded_skeleton_merge_tasks(cloudpath, \
    dust_threshold=1000,
    tick_threshold=3500)
tq.insert(tasks)
tq.execute()

Our stack configure (skeletonize at mip=2 [40, 40, 40]):

info = CloudVolume.create_new_info(
    num_channels    = 1,
    layer_type  = 'segmentation',
    data_type   = 'uint64',
    encoding    = 'compressed_segmentation',
    mesh    = 'mesh_mip_2',
    skeletons   = 'skeletons_mip_2',
    resolution  = [10, 10, 40],
    voxel_offset    = [0, 0, 0],
    chunk_size  = [512, 512, 50],
    volume_size = [10000, 10000, 2500]
)
tasks = tc.create_downsampling_tasks(
    cloudpath,
    mip=0, # Start downsampling from this mip level (writes to next level up)
    axis='z',
    num_mips=2,
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
    factor=(2, 2, 1), # common options are (2,2,1) and (2,2,2)
  )
tasks = tc.create_downsampling_tasks(
    cloudpath,
    mip=2, # Start downsampling from this mip level (writes to next level up)
    axis='z',
    num_mips=3,
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
    factor=(2, 2, 2), # common options are (2,2,1) and (2,2,2)
  )

I think it may not need so much memory. Could you please tell me how to run it without occupying so much memory?

Many thanks!

william-silversmith commented 2 years ago

Hi Jackie! The sharded format is a little harder to run because you need to consider how large files need to be downloaded, filtered, and aggregated. Here's some tips for making this easier:

If you're running the execution in parallel, consider running fewer processes on each machine.
You can reduce memory usage a lot if you run against a regular file system instead of cloud storage. This is because the .frag files are https://github.com/seung-lab/mapbuffer files and can be mmapped to extract the appropriate skeletons for a shard.
You can reduce the run time somewhat by creating a sqlite or mysql database from the spatial index and then referencing it during the merge process.
```
cv = CloudVolume(path)
cv.skeleton.spatial_index.to_sql("spatial_index.db")
```

It might give me some more insight if you can share a representative list of the fragment files with their size in bytes and the number of merge tasks generated.

william-silversmith commented 2 years ago

Sorry, I just realized you inserted and executed in the same script. Usually I do it in two steps so I confused myself. Getting an idea of the number of tasks and the sizes of the fragments will help a lot. I might be able to recommend some tweaks to the shard parameters.

william-silversmith commented 2 years ago

I just added igneous skeleton spatial-index create/db to the CLI so that should make (3) easier.

JackieZhai commented 2 years ago

Thanks for your reply! I am researching this database of the spatial index.

JackieZhai commented 2 years ago

Besides, I have tried more things about this and found even I set:

parallel=1 (actually, create_sharded_skeleton_merge_tasks only make 1 task to execute)
local files and the MapBuffer
max_cable_length=50000

the running memory still grew suddenly at some point in ShardedSkeletonMergeTask.process_skeletons()

william-silversmith commented 2 years ago

Interesting. That says to me that the size of the skeletons themselves is very large... but you clipped them to <50000. You can try changing the settings for create_sharded_skeleton_merge_tasks to create smaller shards by changing the shard_index_bytes and minishard_index_bytes to be smaller. How large are your skeleton .frag files?

This is still really weird though. If this were on my machine, what I would do is start tracing the merge task to find out where all the memory usage was going using import pdb; pdb.set_trace() and memory_profiler. If you can share some memory profiles, that might be helpful. (both graphs of memory usage over time and line by line profiles in important functions)

JackieZhai commented 2 years ago

Besides, I have tried more things about this and found even I set:

parallel=1 (actually, create_sharded_skeleton_merge_tasks only make 1 task to execute)

local files and the MapBuffer

max_cable_length=50000

the running memory still grew suddenly at some point in ShardedSkeletonMergeTask.process_skeletons()

Finally, I set max_cable_length=10000 (a very small number just for testing) and just used < 5 GB memory to finish.

I imported the result into Neuroglancer. Here is one of them: soma_eg

It seems that some extremely messy segments (caused by unaligned images or merged errors of somas) lead to the memory explosion in kimimaro.postprocess().

By the way, I have 125 .frags files, a total of 417 MB.

william-silversmith commented 2 years ago

This makes a lot more sense to me. My skeletons have been (mostly) well behaved and I was able to screen out extremely large mergers while sparing the rest. If you can send me your fragment files, I might be able to do some debugging to figure out where that memory spike is coming from (I won't share them or use them for any other purpose). It might take me a bit to get to it though.

If you figure out which messy segments are causing the problem, you can try filtering them out specifically by editing the merge code. That might be the best approach.

JackieZhai commented 2 years ago

OK, I will send you my .frags files.

Next, I am going to optimize the images and segments.

Anyway, it's a pleasure to talk to you!

william-silversmith commented 2 years ago

Thank you! Looking forward to the frag files!

seung-lab / igneous

SkeletonMergeTask occupies too much memory #126