How am I supposed to run the odr script?

mohsij commented 8 months ago

Hey Jeff,

I used preprocess.py with --no_labels to generate image lists for sunlamp and lightbox then ran the odr script to run into the following error:

2024/01/08 16:19:00 Random seed: 2021
2024/01/08 16:19:00 Creating SPNv2 ...
/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torchvision/ops/misc.py:120: UserWarning: Don't use ConvNormActivation directly, please use Conv2dNormActivation and Conv3dNormActivation instead.
  warnings.warn(
2024/01/08 16:19:00    - Backbone: efficientdet_d3 (# param: 12,006,840)
2024/01/08 16:19:00    - Head #1: heatmap (# param: 253,099)
2024/01/08 16:19:01    - Head #2: efficientpose (# param: 698,216)
2024/01/08 16:19:01    - GroupNorm built for prediction heads
2024/01/08 16:19:01    - Pretrained model loaded from outputs/efficientdet_d3/full_config/model_best.pth.tar
2024/01/08 16:19:02 Total number of parameters with requires_grad=True
2024/01/08 16:19:02    - 105,984
2024/01/08 16:19:02 Training   on sunlamp/labels/test.csv
2024/01/08 16:19:02    - Input size: 768x512
2024/01/08 16:19:02 Validating on sunlamp/labels/test.csv
2024/01/08 16:19:02    - Input size: 768x512
2024/01/08 16:19:02 Creating optimizer: AdamW
2024/01/08 16:19:02    - Initial LR: 0.001
2024/01/08 16:19:02 Mixed-precision training: ENABLED
Traceback (most recent call last):
  File "tools/odr.py", line 181, in <module>
    main(cfg)
  File "tools/odr.py", line 62, in main
    main_worker(
  File "tools/odr.py", line 140, in main_worker
    do_adapt(0,
  File "/mnt/d/repos/pose-estimation/spnv2/tools/../core/engine/adapter.py", line 52, in do_adapt
    images = next(loader_iter)
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/d/repos/pose-estimation/spnv2/tools/../core/dataset/SPEEDPLUSDataset.py", line 105, in __getitem__
    transformed = self.transforms(**transform_kwargs)
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/albumentations/core/composition.py", line 210, in __call__
    data = t(force_apply=force_apply, **data)
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 84, in __call__
    assert all(key in kwargs for key in self.targets_as_params), "{} requires {}".format(
AssertionError: CoarseDropout requires ['image', 'bboxes']

train.py and test.py run fine with the synthetic data but unsure how to proceed with performing ODR on the real datasets

Additional notes: I have trained a model from scratch using a modified offline_train_full_config_phi3_BN config with no segmentation head. Perhaps that may be whats causing the issue?

Help pls

tpark94 commented 8 months ago

Hi Mohsi,

It seems the issue is with the val_loader here, which sets load_labels=True, which, due to the way the code is written, requires the bounding box information for augmentations as the error message suggests.

In any case, if you want to evaluate the performance after ODR you'll need to load the validation dataset with labels. The latest SPEED+ repo includes labels for the HIL domains so you can use them for preprocessing. If you simply want to run it without metric evaluation, you should probably comment out the validation dataset above and the validation function here.

Also, ODR performs entropy minimization on the segmentation output so you do need a model trained with the segmentation head.

Let me know if you still encounter issues!

mohsij commented 8 months ago

Thanks for the help. I'll give that a go and close the issue if it works.

Will I have to create my own masks if I want to train again with the segmentation head for ODR or are they available somewhere for SPEED+? Creating my own would require that I have the tango model which I don't believe is publicly available or is it?

mohsij commented 8 months ago

I commented out the relevant code for the validation loader and still not able to run the odr. Seems like the train_loader is also building the CoarseDropout transform which needs the labels. If I change the load_labels to True in the train_loader here I run into the following error:

Traceback (most recent call last):
  File "tools/odr.py", line 183, in <module>
    main(cfg)
  File "tools/odr.py", line 62, in main
    main_worker(
  File "tools/odr.py", line 140, in main_worker
    do_adapt(0,
  File "/mnt/d/repos/pose-estimation/spnv2/tools/../core/engine/adapter.py", line 68, in do_adapt
    loss, loss_items = model(images,
  File "/home/mohsi/anaconda3/envs/spnv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/repos/pose-estimation/spnv2/tools/../core/nets/build.py", line 159, in forward
    x = self.backbone(x.to(gpu, non_blocking=True))
AttributeError: 'list' object has no attribute 'to'

Potentially this is because of the segmentation head requirement you stated above. Please confirm

tpark94 commented 7 months ago

Hey Mohsi,

So, it seems that the augmentations are indeed being built due to split=='train' here, but they are simply not being used because cfg.AUGMENT.P is set to 0 for ODR (e.g., see here).

Try setting all augmentations from L33 to L37 in the same configuration file to False. That should prevent building these augmentations altogether. I've never had the issue because I always had access to labels ... lol.

The error code above happens because with load_labels = True, the dataloader returns a tuple of images and labels, whereas load_labels = False only returns images.

I'll upload the binary masks by the end of this month so that you'll be able to run ODR. I'll update the codes as well so I'll keep this issue open till then.

tpark94 commented 7 months ago

Hi Mohsi,

I just updated the README with a link to binary masks, please give it a try for ODR and let me know if any issue comes up.

tpark94 / spnv2

How am I supposed to run the odr script? #5