Closed cynthia-you closed 2 months ago
You can use the same command with nproc_per_node
set to 1.
You can use the same command with
nproc_per_node
set to 1.
######## thanks for you instruction , i set node=1, and no change in train_config.yaml. Then go training ,but a bug show:
(bundlesdf) fusion@fusion-1013:~/PycharmProjects/Cutie$ OMP_NUM_THREADS=1 torchrun --nproc_per_node=1 cutie/train.py
[2024-04-23 15:12:45][INFO][r0] - Added key: store_based_barrier_key:1 to store for rank: 0
[2024-04-23 15:12:45][INFO][r0] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
[2024-04-23 15:12:45][INFO][r0] - Initialized: local_rank=0, world_size=1
[2024-04-23 15:12:45][INFO][r0] - All configuration: {'exp_id': 'default', 'debug': False, 'cudnn_benchmark': True, 'weights': None, ...
[2024-04-23 15:12:47][INFO][r0] - object_transformer.summary_to_query_init.weight counted as an embedding parameter.
[2024-04-23 15:12:47][INFO][r0] - object_transformer.summary_to_query_emb.weight counted as an embedding parameter.
Error executing job with overrides: []
Traceback (most recent call last):
File "cutie/train.py", line 117, in train
dataset, sampler, loader = setup_pre_training_datasets(cfg)
File "/home/fusion/PycharmProjects/Cutie/cutie/dataset/setup_training_data.py", line 32, in setup_pre_training_datasets
dataset = SyntheticVideoDataset(dataset_tuples,
File "/home/fusion/PycharmProjects/Cutie/cutie/dataset/static_dataset.py", line 41, in __init__
classes = os.listdir(root)
FileNotFoundError: [Errno 2] No such file or directory: '../static/fss'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
My dataset structure is consistent with the TRIANING.md, do i have to revise the train_config.yaml or move my dataset to a specific direction?
The error says that it is trying to find a file that is not there. You need to change the dataset configuration in cutie/config/data.
The error says that it is trying to find a file that is not there. You need to change the dataset configuration in cutie/config/data.
it works now~~ thanks ![Uploading Screenshot from 2024-04-24 15-54-47.png…]()
BTW, whats the meaning of "exp_id"?
Hi, I really love your repos, from Xmem to Cutie.
I have a question about train custom dataset. Can you show the train command without distribution? My custom data is really small, it consists of over 300 pictures, each with two types of objects.
Huge thanks for your reply~