symphonynet / SymphonyNet

Symphony Generation with Permutation Invariant Language Model
https://symphonynet.github.io
MIT License
252 stars 28 forks source link

Errors about continue training using own dataset #8

Open shiyanpei opened 2 years ago

shiyanpei commented 2 years ago

Hi,

This is an amazing project, I am quite interested in your project.

I would like to play with the pretrained model and continue train the model using my own dataset, but I found there are some errors when I fine tune the model.

I am trying to fine tune the model using my own dataset from the checkpoint downloaded (using your link in readme.md).

I followed the guide in the readme file, processing data, and I modified the training sript:

`#!/bin/bash # while read line;do eval "$line" done < config.sh

while read line;do eval "$line" done < vocab.sh

for model training

if [ $BPE -eq 0 ]; then DATABIN=linear${MAX_POS_LEN}_chord_hardloss${IGNORE_META_LOSS}

else DATABIN=linear${MAX_POS_LEN}_chord_bpe_hardloss${IGNORE_META_LOSS} fi DATA_BIN_DIR=data/model_spec/${DATA_BIN}/bin

N_GPU_LOCAL=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l) UPDATE_FREQ=$((${BATCH_SIZE} / ${MAX_SENTENCES} / ${N_GPU_LOCAL})) NN_ARCH=linear_transformer_multi CHECKPOINT_SUFFIX=${DATA_BIN}_PI${PI_LEVEL}

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" PYTHONWARNINGS="ignore" fairseq-train ${DATA_BIN_DIR} \ --seed ${SEED} \ --user-dir src/fairseq/linear_transformer \ --task symphony_modeling --criterion multiple_loss \ --save-dir ckpt/syn-taobao10w --restore-file ckpt/checkpoint_last_linear_4096_chord_bpe_hardloss1_PI2.pt \ --arch ${NN_ARCH} --sample-break-mode complete_doc --tokens-per-sample ${MAX_POS_LEN} --sample-overlap-rate ${SOR}\ --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-6 --clip-norm 0.0 \ --lr ${PEAK_LR} --lr-scheduler polynomial_decay --warmup-updates ${WARMUP_UPDATES} --total-num-update ${TOTAL_UPDATES} \ --dropout 0.1 --weight-decay 0.01 \ --batch-size ${MAX_SENTENCES} --update-freq ${UPDATE_FREQ} \ --max-update ${TOTALUPDATES} --log-format simple --log-interval 100 \ --checkpoint-suffix ${CHECKPOINT_SUFFIX} \ --tensorboard-logdir logs/${CHECKPOINT_SUFFIX} \ --ratio ${RATIO} --evt-voc-size ${SIZE_0} --dur-voc-size ${SIZE_1} --trk-voc-size ${SIZE_2} --ins-voc-size ${SIZE_3} \ --max-rel-pos ${MAX_REL_POS} --max-mea-pos ${MAX_MEA_POS} --perm-inv ${PI_LEVEL} \ 2>&1 | tee ${CHECKPOINT_SUFFIX}_part${RECOVER}.log`

However I got an error: 

RuntimeError: Error(s) in loading state_dict for LinearTransformerMultiHeadLM: size mismatch for decoder.wEvte.weight: copying a param with shape torch.Size([1125, 512]) from checkpoint, the shape in current model is torch.Size([176, 512]). size mismatch for decoder.wTrke.weight: copying a param with shape torch.Size([44, 512]) from checkpoint, the shape in current model is torch.Size([17, 512]). size mismatch for decoder.wRpe.weight: copying a param with shape torch.Size([199, 512]) from checkpoint, the shape in current model is torch.Size([71, 512]). size mismatch for decoder.wMpe.weight: copying a param with shape torch.Size([5361, 512]) from checkpoint, the shape in current model is torch.Size([393, 512]). size mismatch for decoder.proj_evt.weight: copying a param with shape torch.Size([1125, 512]) from checkpoint, the shape in current model is torch.Size([176, 512]). size mismatch for decoder.proj_trk.weight: copying a param with shape torch.Size([44, 512]) from checkpoint, the shape in current model is torch.Size([17, 512]). size mismatch for decoder.proj_ins.weight: copying a param with shape torch.Size([133, 512]) from checkpoint, the shape in current model is torch.Size([33, 512]).

I know this may because the mismatch of dictionaries. But how to use your pretrained dictionary?

May be you can update your readme.md, to write some guides of how to use the pretrained model to fine tune.

Also when I process data in the bpe step, I got this error:

Traceback (most recent call last): File "src/preprocess/get_bpe_data.py", line 165, in <module> subprocess.run(['./music_bpe_exec', 'learnbpe', f'{MERGE_CNT}', output_dir+'ori_voc_cnt.txt'], stdout=stdout, stderr=stderr) File "/apdcephfs/share_1213607/yanpeishi/software/Anaconda/lib/python3.8/subprocess.py", line 493, in run with Popen(*popenargs, **kwargs) as process: File "/apdcephfs/share_1213607/yanpeishi/software/Anaconda/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/apdcephfs/share_1213607/yanpeishi/software/Anaconda/lib/python3.8/subprocess.py", line 1706, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: './music_bpe_exec'

Is there any idea about the above two questions?

Thank you very much.

sameedhayat commented 2 years ago

Hi, Were you able to find a solution. I am getting the same error. Thanks in advance.