CODEX BALBc1 not available

jesusdpa1 commented 4 years ago

Good Afternoon,

I am trying to run the CODEX BLABc1 example but the dataset is not available. Where can I find this dataset, and is there any better explanation on how to run cytokit on CODEX datasets?

Kind Regards

eric-czech commented 4 years ago

Hey @jesusdpa1, the data is still available at http://welikesharingdata.blob.core.windows.net/forshare/index.html.

All the images used in the Cytokit example are under the heading BALBc-1. This issue might be helpful too, particularly because it has the urls of all the files you'd have to download: https://github.com/hammerlab/cytokit/issues/11.

After downloading all of those tif files, the script for that example should work on it. Happy to help more if it doesn't.

jesusdpa1 commented 4 years ago

Hey Eric,

Thank you for your help. I manage to download the image files, but I am new with jupyter and I am having some difficulties trying to set up the folders to access the data. How do I configure the path so it can access my local directories?

eric-czech commented 4 years ago

Hi @jesusdpa1 , when you run a container with a command like:

export LOCAL_IMAGE_DATA_DIR=/tmp   
nvidia-docker run --rm -ti -p 8888:8888 -p 8787:8787 -p 8050:8050 \
-v $LOCAL_IMAGE_DATA_DIR:/lab/data \
eczech/cytokit:0.1.1

anything in /tmp above would be accessible in the jupyter lab browser under /lab/data.

As another example, say you had images in a folder like /home/me/images/ on your computer (not the container) with files like image1.tif, image2.tif, etc., and a github repository with some custom at /home/me/repos/myrepo -- you could then do this:

nvidia-docker run --rm -ti -p 8888:8888 -p 8787:8787 -p 8050:8050 \
-v /home/me/images:/lab/data/images \
-v /home/me/repos/myrepo:/lab/repos/myrepo \
eczech/cytokit:0.1.1

Notice that you can link as many folders between the container and your computer as you want, but you should connect them to something under either /lab/repos or /lab/data because those definitely exist in the container (because I made them in cytokit/docker/Dockerfile.pub). In this example, all of your images would then show up in jupyter lab under /lab/data/images.

That help?

jesusdpa1 commented 4 years ago

Hi Eric, Thank you very much for your help and sorry to bother you again. I started getting the following error. (I setup the GPU count to [0])

error_cytokit

I have in my laptop 32Gb ram, I7 9gen and RTX2070 (8gb ram)

eric-czech commented 4 years ago

Could you send the rest of that stack trace? I have a suspicion of what it is but if you send me the whole thing I'd know for sure.

jesusdpa1 commented 4 years ago

I think I figure it out. It was a problem with the download UNet. Apparently the first run crash and didn't download the UNet correctly and the subsequent runs where fetching the previous configuration. I restarted the process and the analysis is running smoothly.

Hopefully my last two questions are:

regarding the GPU warnings I got during my run.

cytokit_gpu_warning

If I understand correctly this means that the algorithm try to allocate multiple images in the GPU, failed and proceed to reduce the amount of images that go per process? After the first cycle, it stop showing the warnings.

My second question is for future runs. When running other examples, do I have to hyper stack all of the channels based on the region location? as the files in BALBc1.

Thank you again for your help,

eric-czech commented 4 years ago

Hm that out of memory error is strange -- I've seen it plenty of times but I have never seen it not ultimately result in an exception. I'm not aware of any mechanism Tensorflow has for recovering from those but perhaps allocation for the "Tile Cytometry" step worked following somehwat delayed reclamation of some space from the "Focal plane selection" step (both of which are TF graphs). Either way, I can't imagine there would be any side effects from that warning in the results.

I'm not quite sure I follow you on the hyper stacking question, but the output folder for the run will contain montage images, one for each of the extracts + montages configured here, and those images will contain all the channels in each list in those configs for all tiles, and depending on the "z" parameter in the configs, for either all z planes or the "best" z plane. Is this what you're after?

jesusdpa1 commented 4 years ago

Sure, let me explain it better. During the acquisition step in Akoya, the images are store per cycles (folder), then within each cycle we have each channel per each focal plane per each region. As for the data given, we only get one folder with all the tif stack files separated by region. Within each region tif stack file, the data is organized per channel with all its the focal planes.

Do I need to organized the data as the example data given?

Akoya output

vs

Example

eric-czech commented 4 years ago

The script in cytokit/pub folder is meant to run with the tif files directly after downloading them from the data sharing site (http://welikesharingdata.blob.core.windows.net/forshare/index.html). They didn't actually share the raw data in the format you're talking about (or I would have used it instead) so instead the script starts from an intermediate representation where all the individual images are already concatenated into hyperstacks. The script has a step where these are symlinked into the output directory for the experiment has a somewhat hacky workaround for dealing with this case, where you're not actually starting from the raw data.

Long story short, to reproduce the example in the paper you should use the BALBc-1 images directly (no need to disassemble or re-arrange them). To process your own Akoya data since it looks like you have some, you don't need to do anything special either. The software was intended for that layout (the BALBc-1 script is the exception not the norm) though it does look like you will have to set the path format as it is done in https://github.com/hammerlab/cytokit/blob/master/pub/config/mc38-spheroid/experiment.yaml#L6.

Unfortunately, the software is going to build static paths to all the images and I had never seen the Akoya driver software put timestamps or dates in the folder names like that. This means you'd have to rename the folders to strip that off (i.e. cyc014_reg001_191017_234952 -> cyc014_reg001) or modify the source code to deal with paths having variable components.

Then, as in that link above, you could set the path format as:

environment:
  path_formats: 'get_default_path_formats("cyc{cycle:03d}_reg{region:03d}/{region:d}_{tile:05d}_Z{z:03d}_CH{channel:d}.tif")'

jesusdpa1 commented 4 years ago

Go it! Thank you Eric,

hammerlab / cytokit

CODEX BALBc1 not available #12