kirchhausenlab / incasem

Automated Segmentation of cellular substructures in Electron Microscopy
BSD 3-Clause "New" or "Revised" License
16 stars 2 forks source link

Gunpowder errors #18

Open Secondus2 opened 9 months ago

Secondus2 commented 9 months ago

I have been trying to train a model from scratch on some data produced in one of our research groups at the University of Warwick. The data has voxels which are 70 x 9 x 9 nm, and is 48 x 4096 x 4096 voxels.

mnd_train_mito.json mnd_validation_mito.json config_training_test_post_meeting_patrick .yaml.txt

I have tried to run train.py using the attached files, and the following command:

python3 train.py --name example_training with config_training_test_post_meeting_patrick.yaml training.data=data_configs/mnd_train_mito.json validation.data=data_configs/mnd_validation_mito.json torch.device=0

I get the following output:

INFO:__main__:Attach Mongo observer
INFO:example_training:Running command 'train'
INFO:example_training:Started run with ID "11"
Added application/json as content-type of artifact /mnt/e/Camdu/incasem-main/scripts/02_train/data_configs/mnd_train_mito.json.
Added application/json as content-type of artifact /mnt/e/Camdu/incasem-main/scripts/02_train/data_configs/mnd_validation_mito.json.
INFO:__main__:Starting new training run 11
INFO:__main__:total_params=5837730
INFO:__main__:trainable_params=5837730
INFO:incasem.pipeline.sources.data_sources_base:Setting up my_new_data_train_mito
INFO:incasem.pipeline.sources.data_sources_semantic:No mask given, add dummy mask of all 1s.
INFO:incasem.pipeline.training_baseline_with_context:Sampling probabilities for the provided datasets:
{'my_new_data_train_mito': 1.0}
/home/camdu/.local/lib/python3.8/site-packages/gunpowder/batch_request.py:118: UserWarning: merge is deprecated! please use update_with as it accounts for spec metadata
  warn(
DEBUG:incasem.pipeline.training_baseline_with_context:ZarrSource[/mnt/e/Camdu/incasem-main/data/my_new_data.zarr] -> Crop -> Crop -> Crop -> BinarizeLabels -> MergeLabels -> AddMask -> BinarizeLabels -> MergeMasks -> Normalize -> DeepCopyArrays -> BinarizeLabels -> SaveBlockPosition -> RandomLocationBounded -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> PadDownstreamOfRandomLocation -> CentralizeRequests -> RandomProvider -> Reject -> Downsample -> SimpleAugment -> ElasticAugment -> SimpleAugment -> IntensityAugment -> ToDtype -> BalanceLabels -> IntensityScaleShift -> DeepCopy -> Unsqueeze -> Unsqueeze -> PreCache -> Train -> Squeeze -> Squeeze -> IntensityScaleShift -> FloatToUint8 -> ToDtype -> Softmax -> FloatToUint8 -> DeepCopyArrays -> Snapshot -> Uint8ToFloat -> PrintProfilingStats
INFO:incasem.pipeline.sources.data_sources_base:Setting up my_new_data_validation
INFO:incasem.pipeline.sources.data_sources_semantic:No mask given, add dummy mask of all 1s.
WARNING:incasem.gunpowder.torch.predict:Model is in training mode during prediction. Consider using model.eval()
INFO:__main__:debug_logdir='/mnt/e/Camdu/incasem-main/training_runs/tensorboard/0011/debug'
INFO:incasem.gunpowder.random_location_bounded:requesting complete mask...
INFO:incasem.gunpowder.random_location_bounded:allocating mask integral array...
INFO:incasem.gunpowder.torch.train:Training on gpu 0.
INFO:incasem.gunpowder.torch.train:Starting training from scratch
INFO:incasem.gunpowder.torch.train:Using device cuda:0
INFO:__main__:Training iteration is 0, copying into validation pipeline
INFO:gunpowder.nodes.precache:starting new set of workers (8, cache size 20)...
ERROR:gunpowder.producer_pool:Exception in Unsqueeze while processing request
        RAW: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LABELS: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        BACKGROUND_MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        METRIC_MASK: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LOSS_SCALINGS: ROI: [0:3360, 0:1836, 0:1836] (3360, 1836, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        RAW_POS: ROI: None, voxel size: None, interpolatable: None, non-spatial: True, dtype: None, placeholder: False

Batch returned so far:
None
Traceback (most recent call last):
  File "/home/camdu/.local/lib/python3.8/site-packages/gunpowder/nodes/batch_provider.py", line 182, in request_batch
    self.check_request_consistency(request)
  File "/home/camdu/.local/lib/python3.8/site-packages/gunpowder/nodes/batch_provider.py", line 244, in check_request_consistency
    assert request_roi.get_shape()[d]%provided_spec.voxel_size[d] == 0, \
AssertionError: in request
        RAW: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        LABELS: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        BACKGROUND_MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        METRIC_MASK: ROI: [762:2598, -762:2598, 0:1836] (1836, 3360, 1836), voxel size: None, interpolatable: None, non-spatial: False, dtype: None, placeholder: False
        RAW_POS: ROI: None, voxel size: None, interpolatable: None, non-spatial: True, dtype: None, placeholder: False
, dimension 0 of request RAW is not a multiple of voxel_size 70

Followed by a lot of subsequent errors.

Does anyone have any idea what might be going wrong here?

Thanks a lot, Tim

patrickstock commented 9 months ago

Hi Tim, sorry to see you are still having difficulty. Two additional pieces of info would be helpful to sort this out:

  1. Can you provide the .zarrayand .zattrs from my_new_data.zarr
  2. This one I suppose is unlikely given the timing of your request but I should ask anyway - when did you clone the repository? We fixed an issue in August (PR 14) to enable regions smaller than 204,204,204.