hmorimitsu / ptlflow

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.
Apache License 2.0
250 stars 33 forks source link

train on new data #32

Closed esgomezm closed 2 years ago

esgomezm commented 2 years ago

Hi!

I'm trying to train the model on my own training data but I get the following error:

!python train.py raft_small \
  --gpus 1 \
  --train_dataset overfit-sintel \
  --pretrained_ckpt things \
  --val_dataset none \
  --train_batch_size 1 \
  --train_crop_size 512 128 \
  --max_epochs 100 \
  --lr 1e-3
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
ERROR: torch_scatter not found. CSV requires torch_scatter library to run. Check instructions at: https://github.com/rusty1s/pytorch_scatter
Global seed set to 1234
Downloading: "https://github.com/hmorimitsu/ptlflow/releases/download/weights1/raft_small-things-b7d9f997.ckpt" to /root/.cache/torch/hub/ptlflow/checkpoints/raft_small-things-b7d9f997.ckpt
100% 3.81M/3.81M [00:00<00:00, 26.3MB/s]
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Traceback (most recent call last):
  File "train.py", line 151, in <module>
    train(args)
  File "train.py", line 111, in train
    trainer.fit(model)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1145, in _run
    self.accelerator.setup(self)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/gpu.py", line 46, in setup
    return super().setup(trainer)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 93, in setup
    self.setup_optimizers(trainer)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 352, in setup_optimizers
    trainer=trainer, model=self.lightning_module
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 245, in init_optimizers
    return trainer.init_optimizers(model)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/optimizers.py", line 35, in init_optimizers
    optim_conf = self.call_hook("configure_optimizers", pl_module=pl_module)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
    output = model_fx(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/ptlflow/models/base_model/base_model.py", line 339, in configure_optimizers
    self.train_dataloader()  # Just to initialize dataloader variables
  File "/usr/local/lib/python3.7/dist-packages/ptlflow/models/base_model/base_model.py", line 405, in train_dataloader
    dataset = getattr(self, f'_get_{dataset_name}_dataset')(True, *parsed_vals[2:])
  File "/usr/local/lib/python3.7/dist-packages/ptlflow/models/base_model/base_model.py", line 904, in _get_overfit_dataset
    get_occlusion_mask=False)
  File "/usr/local/lib/python3.7/dist-packages/ptlflow/data/datasets.py", line 1025, in __init__
    f'{passd}, {seq_name}: {len(image_paths)-1} vs {len(flow_paths)}')
AssertionError: clean, .ipynb_checkpoints: -1 vs 0

I prepared the data as in the example:

Screenshot 2022-01-19 at 18 06 54

For the inference, everything works well but not for the training,

Thank you!

hmorimitsu commented 2 years ago

Hi, from the message, it appears that there is a folder called .ipynb_checkpoints inside the clean folder. The script is trying to read images from that folder, but it cannot find any. You should make sure there are no other hidden or temporary folders inside the dataset as well.

I hope that helps.

esgomezm commented 2 years ago

Sorry, I couldn't see it with lscommand. Now it's working =) thank you!