saalfeldlab / render

Render transformed image tiles
GNU General Public License v2.0
34 stars 32 forks source link

Issue while Materializing big images using Java Client #146

Closed nishantshakya closed 1 year ago

nishantshakya commented 1 year ago

I am trying to materialize some images using Java Clients (BoxClient/RenderSectionClient). It is working perfectly for images with dimensions (wh <= Integer.MAX_SIZE). However, our system had some images that have dimensions 130000 90000 but the render client throws an error for NegativeArraySizeException when trying to get the full scale images. Also, while trying to perform the same action using the API, I encountered the same error.

The Render Java Clients are using this code which only supports size as int in the internal Java library code.

I was wondering if there was any way to get full scale materialized image from Render for these big images. Thank you in advance.

trautmane commented 1 year ago

Hi @nishantshakya ,

To make sure I understand your question correctly, can you clarify whether you (a) have source tile images with > Integer.MAX_SIZE pixels or (b) simply have a multi-tile stack where each z-layer has > Integer.MAX_SIZE pixels?

Thanks, Eric

nishantshakya commented 1 year ago

@trautmane, here's the stats for the data. There's only one Z-layer with multiple tiles.

{
  "stackBounds": {
    "minX": 100,
    "minY": 100,
    "minZ": 1,
    "maxX": 130100,
    "maxY": 90100,
    "maxZ": 1
  },
  "sectionCount": 1,
  "nonIntegralSectionCount": 0,
  "tileCount": 936,
  "transformCount": 0,
  "minTileWidth": 5000,
  "maxTileWidth": 5000,
  "minTileHeight": 5000,
  "maxTileHeight": 5000,
  "channelNames": [
    "Channel1"
  ]
}
trautmane commented 1 year ago

Hi @nishantshakya,

There's only one Z-layer with multiple tiles.

Good - that means you have a few options:

  1. Use the RenderSectionClient with a scale that is small enough so that the result image has less than Integer.MAX_INT pixels (e.g. something like --scale 0.15). Of course this is only useful if you don't need full scale output or if your z-layers are relatively small. It will produce one result image for each z layer.

  2. Use the BoxClient to render uniform boxes (derived tiles) to disk for one or more z layers. Use something like --width 2048 --height 2048 to specify the output box size. Each output box must have less than Integer.MAX_INT pixels. This client will produce multiple full scale images (and downsampled images if you specify --maxLevel > 0) for each z-layer in a CATMAID friendly directory structure defined here. You mentioned the BoxClient in your original post above, so I'm wondering why that didn't work for you. Maybe your box size was too big?

  3. If you are able to run Spark jobs on your compute cluster, there are Spark versions of the RenderSectionClient and the BoxClient. There is also a Spark n5 client which is what we primarily use at Janelia now. We typically export n5 (3D block) volumes and then use neuroglancer and BigDataViewer to view them. We have not used the CATMAID BoxClient process for a few years now.

Let me know here if you need more information about any of the Spark client "tools" or if anything I've listed here does not make sense (or if I've misunderstood your issue in any way).

Best, Eric

nishantshakya commented 1 year ago

Thank you, @trautmane, for providing the options.

I attempted to use BoxClient with the maximum dimensions. However, upon examining the code, it seems that both clients (RenderSection and BoxClient) work similarly when using full-scale dimensions. We have considered using smaller boxes; our team is already doing so with Catmaid. My main goal is to determine if it's possible to obtain full-scale images without the necessity of additional stitching operations on the images.

I'm keen to learn more about the third option that uses the Spark versions of the client. Essentially, I am trying to export full-scale images from render, downsample them, and then view them in Neuroglancer. If you have any advice on transitioning from Render to Neuroglancer for visualization, I'm all ears.

catherinerajendran commented 1 year ago

I'm also looking for the full-scale image from render for OpenSeaDragon, which exclusively accepts the entire image as input (full-scale of the original image). The current approach is to utilize BoxClient to generate tiles up to the maximum pixel size (due to the limitations of using integer) and then perform the stitching process. It would be greatly appreciated if there's a better way to obtain full-scale images directly from render.

Best, Catherine

trautmane commented 1 year ago

@catherinerajendran - thanks for explaining that OpenSeaDragon needs a single full scale image for import - now I understand the problem you are trying to solve. Unfortunately, as you've discovered, render uses ImageJ core libraries that store pixel arrays with a 32-bit integer index. This has not been a problem for us since we work with huge volumes that are always broken up into relatively small blocks. Smaller blocks are easier to work with and can be read/written in parallel.

That doesn't help you though ... My first thought was just to use some other tool to join smaller boxes exported from render together (I say join instead of stitch because you wouldn't need to do any alignment). But it sounds like you've already tried that and would prefer another solution. Can you tell me what you are currently using to stitch/join boxes exported from render?

It might be possible to export a huge single n5 block from render, but I'm not sure and I don't know if that helps you since you likely would have to convert the n5 block to something else for import into OpenSeaDragon. I'll check with others at Janelia this week to see if anyone has a better idea.

@nishantshakya - with regard to using Neuroglancer to view render volumes: we do that in two ways at Janelia. The first is a slow-ish 2D view of stacks that are dynamically rendered on-the-fly. The second is a fast 3D view of stacks that have been materialized/exported to disk as n5 volumes. Neither of these will help you with OpenSeaDragon (assuming you and Catherine are trying to solve the same problem), but they are a great way to work with the data. Let me know if you would like more information about how to use neuroglancer with render.

nishantshakya commented 1 year ago

Hi @trautmane,

Yes, I'm keen to understand more about the integration approach that you are taking for visualizing Render data in Neuroglancer. I've managed to create pyramids for smaller images supported by RenderSectionClient, allowing for visualization in Neuroglancer. The larger images would require extra join as you have mentioned. I'd love to get more information about your approach.

The fallback option of using dynamic rendering with a direct connection between Neuroglancer and the render is extremely slow as you also noted.

trautmane commented 1 year ago

Hi @nishantshakya ,

We use the slower dynamic 2D rendering while resolving initial alignment issues to save on compute/storage costs. For dynamically rendered data, it helps to limit neuroglancer to a single XY panel instead of using the default 4panel arrangement since the orthogonal views are very expensive to dynamically render - but it is still slow until the browser caches everything.

Once we are happy with an alignment (or need fast 3D rendering to resolve a problem), we use the Spark N5 client to materialize/export an n5 which can be viewed directly by neuroglancer. If you want to try this, I recommend building from the ibeam-msem branch which has the most current but still stable code.

We run Spark on Janelia's HPC cluster which is currently managed by an LSF scheduler. I don't know/remember what you have available at St. Jude and I'm also not sure how big your data sets typically are. If you give me a few specifics, I can try to help you figure out the best path forward.

Some key questions to answer are:

  1. Do you have/use an HPC cluster?
  2. If so, what scheduler do you use (e.g. LSF, Univa, Slurm, ...) and do you or others already use Spark on the cluster?
  3. Typically, how big are your data sets in terms of pixels (X x Y x Z)? I ask because if the data sets are small enough, running "local" Spark on a single box (and avoiding distributed cluster stuff) might be an option.
nishantshakya commented 1 year ago

Thanks, @trautmane, for sharing the details about the Render to Neuroglancer workflow at Janelia and for pointing me to the latest branch with the N5 Client code.

Please find the answers below:

  1. Yes, we have an HPC cluster.
  2. It uses LSF. Spark has not been used in the cluster. I came to know Hadoop had been used but that was back in 2019.
  3. Our image sizes are extensive. Some have multiple Z-slices, with dimensions approximately at 13,000 x 8,100 x 1,200, while others are 100,000 x 100,000 x 1. Given this, I believe it would be a good idea to first test locally using a smaller sample image before executing tests on the cluster.
trautmane commented 1 year ago

Sorry @nishantshakya for my delayed response - I've been busy and don't have as much time as a I'd like to help with this at the moment. I'm ultimately planning to use this an excuse to setup/write-up a nice example for deploying and using neuroglancer with render. Given my current schedule, I won't likely have that ready for a little while (few weeks to a month maybe?).

In the mean time, you are welcome to wait until that is ready or put up with slow-ish responses to specific questions here ... either way, I appreciate your patience.

Here are some specifics to get you started if you don't want to wait for the full write-up:

The key ideas are to:

We are currently using an older version (3.0.1) of spark. Stuff may work with newer versions but it is probably best for you to use 3.0.1 initially to get things running. The Spark 3.0.1 online docs are here: https://spark.apache.org/docs/3.0.1/. You can download it from: https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1.tgz.

Initially, you can run locally using --master local[N]. Later, we can try to get you running on your LSF cluster. Here is an example of running a local cluster with a different client.

Let me know how things go. Once you have an exported n5, we can discuss how to get neuroglancer to display it.

nishantshakya commented 1 year ago

No worries, @trautmane, I am good for now. I am looking forward for your future example on integration of Neuroglancer with Render. In the meantime, I can take the look at the code on my own and reach out if needed.

The main problem we're having was to export full scale 100K X 100K size image directly from the Render. Neither clients (N5, Box, or RenderSection) support it at the moment and the only option seems to be to join them manually by getting small boxes for getting the full scale image.

nishantshakya commented 1 year ago

Hi @trautmane, the n5 spark client has been working perfectly and the data can be viewed smoothly in neuroglancer. The data with smaller chunk sizes creates a lot of files and we are using NFS file share for storing the data. As there are a lot of file with huge data sizes, the inode limit is a concern. I wanted to check with you if there's any sharding feature in the client or in n5 format that can reduce the number of files.

trautmane commented 1 year ago

Hi @nishantshakya, n5 does not support sharding so the only direct n5 option you have is to increase block size (which will reduce file count but slow down visualization). It might be relatively easy to write something to convert the n5 data to sharded zarr data - but that's just a guess.