yoyo-nb / Thin-Plate-Spline-Motion-Model

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
MIT License
3.44k stars 558 forks source link

imageio.mimread() has read over 256000000B of image data #23

Open aishoot opened 2 years ago

aishoot commented 2 years ago

Thanks for your nice work! I met a problem while I'm training on the TED dataset (Two 32G GPUs).

  File "Thin-Plate-Spline-Motion-Model/train.py", line 55, in train
    for x in dataloader:
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
    return self._process_data(data)
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "Thin-Plate-Spline-Motion-Model/frames_dataset.py", line 172, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "Thin-Plate-Spline-Motion-Model/frames_dataset.py", line 133, in __getitem__
    video_array = read_video(path, frame_shape=self.frame_shape)
  File "Thin-Plate-Spline-Motion-Model/frames_dataset.py", line 43, in read_video
    video = mimread(name)
  File "anaconda3/lib/python3.7/site-packages/imageio/core/functions.py", line 369, in mimread
    int(nbyte_limit)

RuntimeError: imageio.mimread() has read over 256000000B of image data.
Stopped to avoid memory problems. Use imageio.get_reader(), increase threshold, or memtest=False

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 83, in <module>
    train(config, inpainting, kp_detector, bg_predictor, dense_motion_network, opt.checkpoint, log_dir, dataset)
  File "Thin-Plate-Spline-Motion-Model/train.py", line 93, in train
    logger.log_epoch(epoch, model_save, inp=x, out=generated)
TypeError: __exit__() takes 1 positional argument but 4 were given

Thanks for your replay!

yoyo-nb commented 2 years ago

You can try to process the dataset as a sequence of .png frames (use --format .png when running load_videos.py), so that you only need to read two png images instead of the entire video during training, which can reduce memory usage.

aishoot commented 2 years ago

You can try to process the dataset as a sequence of .png frames (use --format .png when running load_videos.py), so that you only need to read two png images instead of the entire video during training, which can reduce memory usage.

Thanks. It really works. Another question: If I want to imitate the expression, head and body movement at the same time, any good ideas?

yoyo-nb commented 2 years ago

According to experiments on TED-talks, facial expression transfer is not very good.

Because the region of the face in the image is too small, and the facial motions are too small compared to the body motions, making it difficult for the model to learn.

Maybe it's possible to do this with multiple models working at the same time.

For example: for the whole image, use the body motion transfer model. At the same time, face detection is performed on the image, and the facial motion transfer model is used in the face region.

aishoot commented 2 years ago

OK, thanks a lot. Got it

FurkanGozukara commented 1 year ago

According to experiments on TED-talks, facial expression transfer is not very good.

Because the region of the face in the image is too small, and the facial motions are too small compared to the body motions, making it difficult for the model to learn.

Maybe it's possible to do this with multiple models working at the same time.

For example: for the whole image, use the body motion transfer model. At the same time, face detection is performed on the image, and the facial motion transfer model is used in the face region.

could you add this? a script to combine vox and ted