Bad results on CSL-Daily dataset

Zachary-Lau-s commented 10 months ago

Hi Zhou,

I have read your paper and am very interested in the idea. Therefore, I would like to conduct some experiments on this model. However, when I switched the dataset to CSL-Daily, I did not achieve satisfactory results. I would appreciate some advice from you on this matter.

I generate the CSL-Daily data file based on the Phoenix data file. Subsequently, I utilized the trim_model.py to create pre-train models on CSL-Daily dataset. I adjusted the src_lang and tgt_lang parameters to "zh_CN".

Could you please provide me with some additional key parameter adjustments and areas I should pay special attention to?

ZechengLi19 commented 10 months ago

Hi Zhou,

I have read your paper and am very interested in the idea. Therefore, I would like to conduct some experiments on this model. However, when I switched the dataset to CSL-Daily, I did not achieve satisfactory results. I would appreciate some advice from you on this matter.

I generate the CSL-Daily data file based on the Phoenix data file. Subsequently, I utilized the trim_model.py to create pre-train models on CSL-Daily dataset. I adjusted the src_lang and tgt_lang parameters to "zh_CN".

Could you please provide me with some additional key parameter adjustments and areas I should pay special attention to?

Hi, did you change this code while testing https://github.com/zhoubenjia/GFSLT-VLP/issues/7#issuecomment-1803110284?

Although I have not completed the reproduction, I am looking forward to communicating with you. Or could you share the weights with me?

Zachary-Lau-s commented 10 months ago

Thank you for your response. During testing, I did not modify the code, but I recorded data for "tgt_pres" and "tgt_refs," where there is a space between each character. 1701742790896

I have generated files like "labels.train" myself, and their contents are as follows. WechatIMG478

Subsequently, I used "trim_model.py" to obtain pretrained parameters. In this process, I only changed the "src_lang" and "tgt_lang" parameters of the tokenizer to "zh_CN". The remaining parameters used were the same as those used for training Phoenix-2014T, including "--batch-size 2", "--epochs 200", "--opt sgd", and "--lr 0.01". However, it is important to note that I utilized only one GPU.

I have trained for 200 epochs, and the BLEU-4 score is only 1.98%. I'm looking forward to receiving your suggestions on how to enhance my score.

Kakarot1103 commented 10 months ago

Thank you for your response. During testing, I did not modify the code, but I recorded data for "tgt_pres" and "tgt_refs," where there is a space between each character.

I have generated files like "labels.train" myself, and their contents are as follows.

Subsequently, I used "trim_model.py" to obtain pretrained parameters. In this process, I only changed the "src_lang" and "tgt_lang" parameters of the tokenizer to "zh_CN". The remaining parameters used were the same as those used for training Phoenix-2014T, including "--batch-size 2", "--epochs 200", "--opt sgd", and "--lr 0.01". However, it is important to note that I utilized only one GPU.

I have trained for 200 epochs, and the BLEU-4 score is only 1.98%. I'm looking forward to receiving your suggestions on how to enhance my score.

I'm sorry for the late reply. When you construct the data, the text should be a continuous sentence, like '上海的冬天很冷注意保暖'. When calculating BLEU, please refer to #7. I'm too busy to update the code about CSL-Daily during this time. But I can provide some instructions.

Create the data file and trim the tokenizer and model.
Use the trimmed tokenizer and model for VLP Pretraining.
Perform GFSLT finetuning. However, we find that using the trimmed mbart-tokenizer in this stage will lead to poor performance. Therefore, we replaced the tokenizer with a char-based tokenizer.

ZechengLi19 commented 10 months ago

Thank you for your response. During testing, I did not modify the code, but I recorded data for "tgt_pres" and "tgt_refs," where there is a space between each character. I have generated files like "labels.train" myself, and their contents are as follows. Subsequently, I used "trim_model.py" to obtain pretrained parameters. In this process, I only changed the "src_lang" and "tgt_lang" parameters of the tokenizer to "zh_CN". The remaining parameters used were the same as those used for training Phoenix-2014T, including "--batch-size 2", "--epochs 200", "--opt sgd", and "--lr 0.01". However, it is important to note that I utilized only one GPU. I have trained for 200 epochs, and the BLEU-4 score is only 1.98%. I'm looking forward to receiving your suggestions on how to enhance my score.

I'm sorry for the late reply. When you construct the data, the text should be a continuous sentence, like '上海的冬天很冷注意保暖'. When calculating BLEU, please refer to #7. I'm too busy to update the code about CSL-Daily during this time. But I can provide some instructions.

Create the data file and trim the tokenizer and model.

Use the trimmed tokenizer and model for VLP Pretraining.

Perform GFSLT finetuning. However, we find that using the trimmed mbart-tokenizer in this stage will lead to poor performance. Therefore, we replaced the tokenizer with a char-based tokenizer.

Awesome! Could you tell me how to implement char-based tokenizer?

Kakarot1103 commented 10 months ago

Thank you for your response. During testing, I did not modify the code, but I recorded data for "tgt_pres" and "tgt_refs," where there is a space between each character. I have generated files like "labels.train" myself, and their contents are as follows. Subsequently, I used "trim_model.py" to obtain pretrained parameters. In this process, I only changed the "src_lang" and "tgt_lang" parameters of the tokenizer to "zh_CN". The remaining parameters used were the same as those used for training Phoenix-2014T, including "--batch-size 2", "--epochs 200", "--opt sgd", and "--lr 0.01". However, it is important to note that I utilized only one GPU. I have trained for 200 epochs, and the BLEU-4 score is only 1.98%. I'm looking forward to receiving your suggestions on how to enhance my score.

I'm sorry for the late reply. When you construct the data, the text should be a continuous sentence, like '上海的冬天很冷注意保暖'. When calculating BLEU, please refer to #7. I'm too busy to update the code about CSL-Daily during this time. But I can provide some instructions.

Create the data file and trim the tokenizer and model.

Use the trimmed tokenizer and model for VLP Pretraining.

Perform GFSLT finetuning. However, we find that using the trimmed mbart-tokenizer in this stage will lead to poor performance. Therefore, we replaced the tokenizer with a char-based tokenizer.

Awesome! Could you tell me how to implement char-based tokenizer?

I originally used the torchtext library to build, you can refer to the build_vocab function in utils.py. But it's not very convenient. You can also try using huggingface's tokenizer library. Here is a simple example code：

from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers.trainers import WordLevelTrainer
from tokenizers.pre_tokenizers import Whitespace
tokenizer = Tokenizer(WordLevel())
tokenizer.pre_tokenizer = Whitespace()
trainer = WordLevelTrainer(special_tokens = ['unk'])
#sentences should be the list of all sentences in CSL-daily
sentences = ['你 好 啊','吃 饭 了 吗']
tokenizer.train_from_iterator(sentences, trainer)
output = tokenizer.encode("吃 饭 了 吗").ids
print(output)#[3,7,4,1]
text = tokenizer.decode(output)
print(text)#吃 饭 了 吗

JinhuiYE commented 7 months ago

Hi Zhou

Thx for your amazing code. I have conducted some experiments on Phoenix-Dataset and got some similar results with your paper. However, when I switched the dataset to CSL-Daily, I did not achieve satisfactory results. The BLEU-4 score is only 5.96

As referring to some relative issues #7, I utilized the trim_model.py to create pre-train models on CSL-Daily dataset. and the vocab size is "vocab_size": 6036

Here is our config. I would appreciate some advice from you on this matter.

config.json

JinhuiYE commented 7 months ago

@zhoubenjia Hi, Zhou

We followed the previous discussion to build mbart for the CSL-Daily dataset and the GFSLT baseline got 8.36 at B@4 on test set. But GFSLT+VLP got worse B@4 than baseline. It is quite weird. We have double and try different dropouts, lr, and stuff like that.

We attach our log files for pre-train, baseline, and GFSLT+VLP.

csl_GFSLT_VLP_lightingPretrained_0301.txt CSL_GFSLT_VLP.txt CSL_GFSLT.txt CSL_VLP.txt

zhoubenjia commented 7 months ago

@zhoubenjia Hi, Zhou

We followed the previous discussion to build mbart for the CSL-Daily dataset and the GFSLT baseline got 8.36 at B@4 on test set. But GFSLT+VLP got worse B@4 than baseline. It is quite weird. We have double and try different dropouts, lr, and stuff like that.

We attach our log files for pre-train, baseline, and GFSLT+VLP.

csl_GFSLT_VLP_lightingPretrained_0301.txt CSL_GFSLT_VLP.txt CSL_GFSLT.txt CSL_VLP.txt

Hi, have you tried using the mbart native decoder as the text decoder? CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=1236 --use_env train_vlp.py --batch-size 4 --epochs 80 --opt sgd --lr 0.01 --output_dir out/vlp --decoder-type LLMD

JinhuiYE commented 6 months ago

Hi, I already built char_level tokenizer and used LLMD for CSL-Daily. Our best B@4 for GFSLT baseline is 9.58. But when we apply VLP, the results are around 3 B@4.

We pretrained word level trans [w/ wo LLMD] with [vlp or vlp_v2]
We pretrained char level trans [w/ wo LLMD] with [vlp or vlp_v2]

At fine-tune stage, we load params with --decoder-type LLMD

Both of them strongly decreased the B@4 to around 3.*

Here are our log.txts, can you help me out? Which setting you use for pertaining fine-tune?

LLMD_config_csl_char.json LLMD_config_csl_word.json csl_GFSLT_VLP_lightingPretrained_0301.txt GFSLT_vlpv2WordLlmd_charLLMD.txt

JinhuiYE commented 6 months ago

@Zachary-Lau-s @ZechengLi19

Hi, did you reduce the results for CSL-Daily? any suggestion?

ZechengLi19 commented 6 months ago

@Zachary-Lau-s @ZechengLi19

Hi, did you reduce the results for CSL-Daily? any suggestion?

We have not tried to reproduce it, but when we used the GFSLT code repository, we did find that the GFSLT had training instability.

zhoubenjia / GFSLT-VLP

Bad results on CSL-Daily dataset #8