hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
https://hkchengrex.com/Cutie/
MIT License
579 stars 60 forks source link

Train with one machine one gpu #62

Closed cynthia-you closed 2 months ago

cynthia-you commented 2 months ago

Hi, I really love your repos, from Xmem to Cutie.

I have a question about train custom dataset. Can you show the train command without distribution? My custom data is really small, it consists of over 300 pictures, each with two types of objects. Screenshot from 2024-04-22 13-43-41 Huge thanks for your reply~

hkchengrex commented 2 months ago

You can use the same command with nproc_per_node set to 1.

cynthia-you commented 2 months ago

You can use the same command with nproc_per_node set to 1.

######## thanks for you instruction , i set node=1, and no change in train_config.yaml. Then go training ,but a bug show:

(bundlesdf) fusion@fusion-1013:~/PycharmProjects/Cutie$ OMP_NUM_THREADS=1 torchrun  --nproc_per_node=1 cutie/train.py 
[2024-04-23 15:12:45][INFO][r0] - Added key: store_based_barrier_key:1 to store for rank: 0
[2024-04-23 15:12:45][INFO][r0] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
[2024-04-23 15:12:45][INFO][r0] - Initialized: local_rank=0, world_size=1
[2024-04-23 15:12:45][INFO][r0] - All configuration: {'exp_id': 'default', 'debug': False, 'cudnn_benchmark': True, 'weights': None, ...
[2024-04-23 15:12:47][INFO][r0] - object_transformer.summary_to_query_init.weight counted as an embedding parameter.
[2024-04-23 15:12:47][INFO][r0] - object_transformer.summary_to_query_emb.weight counted as an embedding parameter.
Error executing job with overrides: []
Traceback (most recent call last):
  File "cutie/train.py", line 117, in train
    dataset, sampler, loader = setup_pre_training_datasets(cfg)
  File "/home/fusion/PycharmProjects/Cutie/cutie/dataset/setup_training_data.py", line 32, in setup_pre_training_datasets
    dataset = SyntheticVideoDataset(dataset_tuples,
  File "/home/fusion/PycharmProjects/Cutie/cutie/dataset/static_dataset.py", line 41, in __init__
    classes = os.listdir(root)
FileNotFoundError: [Errno 2] No such file or directory: '../static/fss'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

My dataset structure is consistent with the TRIANING.md, do i have to revise the train_config.yaml or move my dataset to a specific direction?

hkchengrex commented 2 months ago

The error says that it is trying to find a file that is not there. You need to change the dataset configuration in cutie/config/data.

cynthia-you commented 2 months ago

The error says that it is trying to find a file that is not there. You need to change the dataset configuration in cutie/config/data.

it works now~~ thanks ![Uploading Screenshot from 2024-04-24 15-54-47.png…]()

cynthia-you commented 2 months ago

BTW, whats the meaning of "exp_id"?

hkchengrex commented 2 months ago

Explained here https://github.com/hkchengrex/Cutie/blob/main/docs/TRAINING.md#training-command