hmorimitsu / ptlflow

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.
Apache License 2.0
250 stars 33 forks source link

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int' #34

Closed coufin closed 1 year ago

coufin commented 2 years ago

/python3.8/site-packages/ptlflow/utils/callbacks/logger.py", line 411, in _compute_max_range max_range = int(limit_batches dataloader_length) - 1 TypeError: unsupported operand type(s) for : 'NoneType' and 'int'

hmorimitsu commented 2 years ago

Hi, could you tell me which version of pytorch and pytorch-lightning you are using?

coufin commented 2 years ago

torch version: '1.12.0+cu102' lightning version: 1.6.4

coufin commented 2 years ago

Another problem is when I try to train this python3 train.py pwcnet --train_dataset kitti-training --val_dataset none --train_batch_size 1 --train_crop_size 512 128 --max_epochs 100 --lr 1e-3 the error is /.local/lib/python3.10/site-packages/ptlflow/data/datasets.py", line 911, in init assert len(img1_paths) == len(flow_paths), f'{len(img1_paths)} vs {len(flow_paths)}' AssertionError: 0 vs 200

hmorimitsu commented 2 years ago

Another problem is when I try to train this python3 train.py pwcnet --train_dataset kitti-training --val_dataset none --train_batch_size 1 --train_crop_size 512 128 --max_epochs 100 --lr 1e-3 the error is /.local/lib/python3.10/site-packages/ptlflow/data/datasets.py", line 911, in init assert len(img1_paths) == len(flow_paths), f'{len(img1_paths)} vs {len(flow_paths)}' AssertionError: 0 vs 200

This means that the images are not found. You should check if the KITTI images are in the correct directories. For KITTI 2012, they should be at: <kitti_2012_root_dir>/training/colored_0/, and for KITTI 2015 it's: <kitti_2015_root_dir>/training/image_2/. The image names should also follow the KITTI standard, which is *_10.png and *_11.png.

hmorimitsu commented 2 years ago

/python3.8/site-packages/ptlflow/utils/callbacks/logger.py", line 411, in _compute_max_range max_range = int(limit_batches dataloader_length) - 1 TypeError: unsupported operand type(s) for : 'NoneType' and 'int'

Thank you for reporting this problem.

It was caused by a change in the default value in the arguments of pytorch-lightning. It is now fixed in the main branch.

To fix your code, please update to the latest version of this repo or downgrade your pytorch-lightning to 1.5.

coufin commented 2 years ago

I want to know how to solve this error /.local/lib/python3.8/site-packages/ptlflow/models/base_model/base_model.py", line 337, in configure_optimizers assert self.loss_fn is not None, f'Model {self.class.name} cannot be trained. It does not have loss function.' AssertionError: Model LiteFlowNet cannot be trained. It does not have loss function.

hmorimitsu commented 2 years ago

This is currently a limitation of the code. Not all models can be trained.

In this case, this model does not have a loss function yet. This is because the original code was not in PyTorch and I did not convert the training part to my library. Making sure that the training is correct takes a long time, so I only added training when the original code also provided it in PyTorch.

Also, please note that I only tested the training using RAFT, and by default, the training hyperparameters also follow RAFT. So I do not guarantee that training other models will provide similar results to the original ones.

coufin commented 2 years ago

And how to solve this /.local/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 107, in init raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0

coufin commented 2 years ago

It is working. Thank you

coufin commented 2 years ago

How can I calculate F1-score for the model?

hmorimitsu commented 2 years ago

F1 is returned as a metric when you use a model that predicts either: confidence, occlusion, or motion boundaries.

You can see the available metrics at: https://github.com/hmorimitsu/ptlflow/blob/main/ptlflow/utils/flow_metrics.py

coufin commented 2 years ago

How can I see the metrics after training a model?

hmorimitsu commented 2 years ago

The metrics are logged by pytorch-lightning, which by default is using tensorboard for visualization.

If you want to see them in another way, you should change the base_model and maybe the train scripts.

coufin commented 2 years ago

Hello, I want to know what do epe and px1, px3 stand for in results.

hmorimitsu commented 2 years ago

EPE (end-point-error): the average Euclidean distance between the prediction and the groundtruth px1, px3: percentage of predictions within 1 (or 3) pixels away from the groundtruth

coufin commented 2 years ago

In Tensorboard I only see epe, outlier, but no F1, where can I see it?

hmorimitsu commented 2 years ago

Are you using a model and a dataset that has confidence, occlusion, or motion boundaries?

coufin commented 2 years ago

I use the model: PWCNet and dataset is KITTI-2012

hmorimitsu commented 2 years ago

Do you really mean F1 or FL (the metric from KITTI 2015)?

If FL, then it is the same as the outlier metric.

If you really want F1, then it is not available for PWCNet, as this model does not output confidence, occlusion, or motion boundaries predictions.

coufin commented 2 years ago

OK, but models raft and Flownets also can't see F1. Are they also not available? And what is FL mean for models?

hmorimitsu commented 2 years ago

Very few models have predictions compatible with F1. At the moment, there is not any list of models which support it, the only way is to look at their code or read the respective papers.

For FL, I recommend you check the outlier results for the KITTI 2015 dataset.

coufin commented 2 years ago

OK tahnk you.

coufin commented 2 years ago

when run craft I have this error RuntimeError: CUDA out of memory. Tried to allocate 286.00 MiB (GPU 0; 7.79 GiB total capacity; 6.15 GiB already allocated; 69.25 MiB free; 6.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

hmorimitsu commented 2 years ago

This means CRAFT is too large for the GPU you are using. You either have to use smaller images, decrease the batch size, or run in a GPU with more memory.

coufin commented 2 years ago

I want to know where can I see the metrics of the test of these model? I can only see metrics like epe, px1 of train in tensorboard

coufin commented 2 years ago

When I train model dicl: The error is this RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 3. Target sizes: [1, 32, 5, 1]. Tensor sizes: [32, 5, 0]

hmorimitsu commented 1 year ago

I want to know where can I see the metrics of the test of these model? I can only see metrics like epe, px1 of train in tensorboard

By default, validation is run at the end of every epoch of training. So you should see the validation metrics as well.

Did you configure the datasets.yml file to point to the validation datasets?

hmorimitsu commented 1 year ago

When I train model dicl: The error is this RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 3. Target sizes: [1, 32, 5, 1]. Tensor sizes: [32, 5, 0]

Thanks for reporting. I'll take a look later to see what's wrong.

coufin commented 1 year ago

Can you tell me the difference between epe and loss?

hmorimitsu commented 1 year ago

EPE is just a value (the Euclidean distance), while the loss is the signal that drives the training. EPE could be used as the loss signal, but so could other values as well.

coufin commented 1 year ago

I want to know what is the signal of loss in the models, and why if epe used as loss signal why they are different?

coufin commented 1 year ago

If I want the x-axis change to epoch from batch in tensorboard, What should I do?

coufin commented 1 year ago

epe_epoch

hmorimitsu commented 1 year ago

Sorry, but I'm not sure. The logging is handled by Pytorch Lightning, so you can try to see their docs to find how to change it.