Voxel size error - Githubissues

gparlakgul commented 4 months ago

Hi, In our data, the voxel size 8nm, not 5nm. I update this information accordingly in the .json file under incasem/scripts/03_predict/data_configs/ . However, when I run the predict.py script, I receive the error below. Do you know how I can troubleshoot this? Do I need to update the voxel size elsewhere, e.g. during the .tiff to .zarr conversion or in another .json or .yaml file?

ERROR:example_prediction_test_ER:Failed after 0:00:00! Traceback (most recent calls WITHOUT Sacred internals): File "/content/incasem/scripts/03_predict/predict.py", line 326, in predict predictions = multiple_prediction_setup( File "/content/incasem/scripts/03_predict/predict.py", line 126, in multiple_prediction_setup prediction_setup( File "/content/incasem/scripts/03_predict/predict.py", line 144, in prediction_setup prediction = pipeline_type( File "/content/incasem/incasem/pipeline/prediction_baseline.py", line 44, in init self._assemble_pipeline() File "/content/incasem/incasem/pipeline/prediction_baseline.py", line 71, in _assemble_pipeline raise ValueError(( ValueError: Specfied voxel size (5, 5, 5) does not match the voxel size of the datasets (8, 8, 8).

patrickstock commented 4 months ago

Hi Gunes,

If you updated the info in the config file at incasem/scripts/03_predict/data_configs/, then you need to make sure it is in agreement with the resolution specified in the .zarr file, which is specified in the .zattrs (a hidden file within a .zarr volume)

This is on a per-volume basis within the .zarr file, so make sure to update it for all the volumes you are interested in using. For example, if I was using labels and raw data within my sample cell, I might look at these two files:

my_sample_cell.zarr/volumes/raw/.zattrs
my_sample_cell.zarr/volumes/labels/my_ER_labels/.zattrs

You can do this when you convert the tiffs by including the following command line option with 00_image_sequences_to_zarr.py : --resolution 8 8 8

If you did not do that, its fine, just manually edit the values under resolution in the .zattrs under the resolution field.

gparlakgul commented 4 months ago

Hi Patrick,

Thanks for your quick response. I just did it but I still receive the same error. Is there any other document (.yaml or .json) I need to update? Below is my current .zattrs and .json files:

.zattrs files under both raw_equalized_0.02 and raw folders. { "offset": [ 0, 0, 0 ], "resolution": [ 8, 8, 8 ] }

.json file under incasem/scripts/03_predict/data_configs/ { "test_example_roi_nickname" : { "file": "/content/incasem/data/test.zarr", "offset": [8, 8, 8], "shape": [204, 204, 204], "voxel_size": [8, 8, 8], "raw": "volumes/raw_equalized_0.02" } }

patrickstock commented 4 months ago

Could you send the command you are executing as well? It should be the case that you only need to change the zarr metadata and the configuration .json in data_configs. The yaml should be the same, only changing if there is a different block size, not with resolution though.

What are you using for the prediction model? More likely than not, the problem is that the metadata of the model has 5nm as voxel_size

gparlakgul commented 4 months ago

I'm using the provided models (1841 in this case). Below are the command lines for tiff to zarr conversion and ER prediction. I'm attaching the .json and .zattrs files.

zattrs content.txt [test.json](https://github.com/kirchhausenlab/incasem/files/14306310/test.json

python 00_image_sequences_to_zarr.py --resolution 8 8 8 -i /content/incasem/data/test -f /content/incasem/data/test.zarr

python 40_equalize_histogram.py -f /content/incasem/data/test.zarr -d volumes/raw -o volumes/raw_equalized_0.02

python predict.py --run_id 1841 --name example_prediction_test_ER with config_prediction.yaml 'prediction.data=data_configs/test.json' 'prediction.checkpoint=/content/incasem/models/pretrained_checkpoints/model_checkpoint_1841_er_CF.pt'

patrickstock commented 4 months ago

Hi Gunes,

That model was trained on a resolution of 5nm and will most likely not work for your case. We have tested some 5nm models on 4nm data by simply overwriting the metadata to make them agree. That case (using 5nm model on 4nm data) already has some deterioration of performance, so my guess is that 5nm model on 8nm data will be pretty bad.

There are a few options based on what data you have available:

You can train a new 8nm model from scratch if you have a decent quantity of labeled data that are good quality annotations
You can overwrite the metadata of a 5nm model of ours, make it 8nm, and then finetune on your own 8nm data if you have some annotations available. For this, you will not need as much annotated data as you would to train from scratch. The instructions for that are in the fine-tuning section of the readme
If you are interested in working with ER, you can use the model checkpoint model_checkpoint_72000 that @athulnair02 shared with you via email on Jul 3, 2023, which is 8nm

patrickstock commented 4 months ago

I believe you are running everything else properly! it is just a mismatch with the model. Please let me know if there is anything else I can help with, otherwise I will close the issue

gparlakgul commented 4 months ago

hi Patrick, thank you so much for all the insight and suggestions. I do actually use the model provided by Athul, adapted for 8nm. I just renamed it as model_checkpoint_1841_er_CF.pt and replaced with the original model, so that I wouldn't need to deal with the MongoDB and other dependencies on its name. I also did this because the model provided by Athul didn't have any extension (.pt). So the model is adapted for 8nm. I'm still not sure why it's giving the voxel size error. Do you think changing the name of the model may have any impact? Thanks for your time!

patrickstock commented 4 months ago

The name should not matter but all of the models have an associated json that has a bunch of metadata, the checkpoint is just weights. If you have mongodb set up, when you add the argument --run_id 1841 it is telling it to grab the config from mongodb under the key 1841 which gets populated into the database when you run the download_models.py file. You don't necessarily need to use mongodb, because its just a utility to load a json file with that information, it still needs to get that info though.

I made a branch of this repository for using in Colab, where connecting with mongodb was not an option. If you look at that branch, there is a folder called mock_db that basically accomplishes the same thing mongodb would but on a very simplified and local level. It stores these model jsons and keeps a ledger so that naming works out. Here is the model config for 1841 from that branch, which is the same as is stored in 1841's key on the mongodb case:

https://github.com/kirchhausenlab/incasem/blob/colab_notebook/mock_db/er_1841.json

Line 113 has a voxel size of 5,5,5. I think most of the rest of the file will not matter for your case, so you could just edit that to be 8,8,8. You still need to load this into your predict.py in the absence of mongodb though. You can take a look at how I did this, also on the Colab branch (line 352):

https://github.com/kirchhausenlab/incasem/blob/colab_notebook/scripts/03_predict/predict.py

That version of predict.py has everything below line 340 configured so that it behaves identically to the version of predict.py in the main branch, and can be called with the same command line syntax and arguments.

So you need to modify the voxel size in that json, and then you need to load it. You can either clone the colab_notebook branch and reinstall, which lets you work with the model configuration files manually via the mock_db, or you can modify line 428 - line 433 in the version of predict.py that you have to load the proper config json from your local directory.

gparlakgul commented 4 months ago

This is very helpful, thank you so much! I'll try these steps. Thanks for the clarification on model's association with other files.

kirchhausenlab / incasem

Voxel size error #19