Closed Rocable closed 6 months ago
save_data: run
data: corpus_1: path_src: UN.en-zh.zh-filtered.zh.subword.train path_tgt: UN.en-zh.en-filtered.en.subword.train transforms: [filtertoolong] valid: path_src: UN.en-zh.zh-filtered.zh.subword.dev path_tgt: UN.en-zh.en-filtered.en.subword.dev transforms: [filtertoolong]
src_vocab: run/source.vocab tgt_vocab: run/target.vocab
src_vocab_size: 50000 tgt_vocab_size: 50000
src_seq_length: 150 tgt_seq_length: 150
src_subword_model: source.model tgt_subword_model: target.model
log_file: train.log save_model: models/model.zhen
early_stopping: 4
save_checkpoint_steps: 1000
seed: 3435
train_steps: 200000
valid_steps: 8000
warmup_steps: 8000 report_every: 100
model_config: self_attn_type: "scaled-dot"
world_size: 1 gpu_ranks: [0]
bucket_size: 262144 num_workers: 0 # Default: 2, set to 0 when RAM out of memory batch_type: "tokens" batch_size: 8192 # Tokens per batch, change when CUDA out of memory valid_batch_size: 4096 max_generator_batches: 2 accum_count: [4] accum_steps: [0]
model_dtype: "fp16" optim: "adam" learning_rate: 0.5
decay_method: "noam" adam_beta2: 0.998 max_grad_norm: 0.1 # Adjust as needed label_smoothing: 0.1 param_init: 0 param_init_glorot: true normalization: "tokens"
encoder_type: transformer decoder_type: transformer position_encoding: true enc_layers: 6 dec_layers: 6 heads: 8 hidden_size: 512 # Adjust as needed word_vec_size: 512 transformer_ff: 4096 # Adjust as needed dropout_steps: [0] dropout: [0.1] attention_dropout: [0.1]
this is my configuration file code,Can you roughly check if there are any codes that can cause errors?Because I can train the model through this file, but I cannot identify any issues within it
and this is my run command : ct2-opennmt-py-converter --model_path models/model.zhen_step_1000.pt --output_dir enzh_ctranslate2 --quantization int8
Although my model has a small dataset, I can already obtain a bleu score of 77, so I estimate that the previous steps should be fine,i just stuck in this step
This seems to be related to a bug in CTranslate2 see here. Kindly update CTranslate2 to the latest version, or apply the workaround suggested in the link. If this does not solve the error, feel free to open at issue in the CTranslate2 repository.
I have trained a 1000 step model, but when I tried to convert it to ctranslate2, I entered the command and this error appeared. But I checked my configuration file and found that self attn type was set to scale dot, but the model I got was of scale dot flash type. So what caused this error? If you need my configuration file, I will provide it to you immediately, but overall, my configuration file has been slightly modified on top of yours. Is it possible that a code error caused a command error?