Rocable commented 6 months ago

I have trained a 1000 step model, but when I tried to convert it to ctranslate2, I entered the command and this error appeared. But I checked my configuration file and found that self attn type was set to scale dot, but the model I got was of scale dot flash type. So what caused this error? If you need my configuration file, I will provide it to you immediately, but overall, my configuration file has been slightly modified on top of yours. Is it possible that a code error caused a command error?

Rocable commented 6 months ago

config.yaml

Where the samples will be written

save_data: run

Training files

data: corpus_1: path_src: UN.en-zh.zh-filtered.zh.subword.train path_tgt: UN.en-zh.en-filtered.en.subword.train transforms: [filtertoolong] valid: path_src: UN.en-zh.zh-filtered.zh.subword.dev path_tgt: UN.en-zh.en-filtered.en.subword.dev transforms: [filtertoolong]

Vocabulary files, generated by onmt_build_vocab

src_vocab: run/source.vocab tgt_vocab: run/target.vocab

Vocabulary size - should be the same as in sentence piece

src_vocab_size: 50000 tgt_vocab_size: 50000

Filter out source/target longer than n if [filtertoolong] enabled

src_seq_length: 150 tgt_seq_length: 150

Tokenization options

src_subword_model: source.model tgt_subword_model: target.model

Where to save the log file and the output models/checkpoints

log_file: train.log save_model: models/model.zhen

Stop training if it does not improve after n validations

early_stopping: 4

Default: 5000 - Save a model checkpoint for each n

save_checkpoint_steps: 1000

To save space, limit checkpoints to last n

keep_checkpoint: 3

seed: 3435

Default: 100000 - Train the model to max n steps

Increase to 200000 or more for large datasets

For fine-tuning, add up the required steps to the original steps

train_steps: 200000

Default: 10000 - Run validation after n steps

valid_steps: 8000

Default: 4000 - for large datasets, try up to 8000

warmup_steps: 8000 report_every: 100

Model configuration

model_config: self_attn_type: "scaled-dot"

Number of GPUs, and IDs of GPUs

world_size: 1 gpu_ranks: [0]

Batching

bucket_size: 262144 num_workers: 0 # Default: 2, set to 0 when RAM out of memory batch_type: "tokens" batch_size: 8192 # Tokens per batch, change when CUDA out of memory valid_batch_size: 4096 max_generator_batches: 2 accum_count: [4] accum_steps: [0]

Optimization

model_dtype: "fp16" optim: "adam" learning_rate: 0.5

warmup_steps: 8000

decay_method: "noam" adam_beta2: 0.998 max_grad_norm: 0.1 # Adjust as needed label_smoothing: 0.1 param_init: 0 param_init_glorot: true normalization: "tokens"

Model

encoder_type: transformer decoder_type: transformer position_encoding: true enc_layers: 6 dec_layers: 6 heads: 8 hidden_size: 512 # Adjust as needed word_vec_size: 512 transformer_ff: 4096 # Adjust as needed dropout_steps: [0] dropout: [0.1] attention_dropout: [0.1]

this is my configuration file code，Can you roughly check if there are any codes that can cause errors?Because I can train the model through this file, but I cannot identify any issues within it

and this is my run command ： ct2-opennmt-py-converter --model_path models/model.zhen_step_1000.pt --output_dir enzh_ctranslate2 --quantization int8

Although my model has a small dataset, I can already obtain a bleu score of 77, so I estimate that the previous steps should be fine，i just stuck in this step

ymoslem commented 6 months ago

This seems to be related to a bug in CTranslate2 see here. Kindly update CTranslate2 to the latest version, or apply the workaround suggested in the link. If this does not solve the error, feel free to open at issue in the CTranslate2 repository.

ymoslem / OpenNMT-Tutorial

ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons: - Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot) #14