[ROC] Mistakes in training

CatherineJun commented 1 year ago

When I trained the model, the error "FileNotFoundError: Dataset not found: valid (data/lmd_processed/valid)" appeared. How can i solve the problem?

trestad commented 1 year ago

Please (1) check whether 'data/lmd_processed/valid.notes' exists and (2) the training script is for fairseq 0.10.1, if you are using a higher version, check whether the command arguments change, e.g., the source or target need to be specified.

gandolfxu commented 1 year ago

@trestad Another mistake of traing.

fairseq-train data/lmd_processed/ \ --arch transformer_lm --task language_modeling \ --decoder-attention-heads 4 --decoder-embed-dim 256 \ --decoder-input-dim 256 --decoder-output-dim 256 \ --decoder-layers 4 --update-freq 1 --optimizer adam \ --adam-betas '(0.9, 0.98)' --adam-eps 1e-6 --clip-norm 0.0 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 \ --warmup-updates 4000 --lr 0.0001 --attention-dropout 0.1 \ --dropout 0.1 --weight-decay 0.01 --max-update 50000 \ --save-dir music-ckps2 --batch-size 1 --max-target-positions 512 \ --log-interval 100 --patience 20 --no-epoch-checkpoints \ --best-checkpoint-metric 'ppl' | tee music-ckps/log.txt

2023-01-20 02:46:22 | WARNING | fairseq.tasks.fairseq_task | 18505 samples have invalid sizes and will be skipped, max_positions=512, first few sample ids=[18504, 4243, 11102, 10387, 11829, 27, 2933, 1156, 6782, 14445] Traceback (most recent call last): File "/opt/conda/envs/muzic/bin/fairseq-train", line 8, in sys.exit(cli_main()) File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main distributed_utils.call_main(args, main) File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 301, in call_main main(args, **kwargs) File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq_cli/train.py", line 110, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint( File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 212, in load_checkpoint epoch_itr = trainer.get_train_iterator( File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq/trainer.py", line 382, in get_train_iterator self.reset_dummy_batch(batch_iterator.first_batch) File "/opt/conda/envs/muzic/lib/python3.8/site-packages/fairseq/data/iterators.py", line 280, in first_batch raise Exception( Exception: The dataset is empty. This could indicate that all elements in the dataset have been skipped. Try increasing the max number of allowed tokens or using a larger dataset.

trestad commented 1 year ago

According to your warning ''18505 samples have invalid sizes and will be skipped, max_positions=512", I guess that your data is so long that all of them are skipped. Perhaps you should use a larger 'max-target-position' like the excepption suggests: 'Exception: The dataset is empty. This could indicate that all elements in the dataset have been skipped. Try increasing the max number of allowed tokens or using a larger dataset.'

microsoft / muzic

[ROC] Mistakes in training #94