solaoi commented 5 months ago

Thank you for your great work! I would like to use this with the MarianMT Model as described here: https://opennmt.net/CTranslate2/guides/transformers.html#marianmt

I would appreciate your support on this.

jkawamoto commented 5 months ago

MarianMT models are compatible with the current implementation using the sentencepiece tokenizer. Below is a sample code:

use sentencepiece::SentencePieceProcessor;

use ct2rs::config::Config;
use ct2rs::translator::Translator;

let t = Translator::new("./data/opus-mt-en-jap", Config::default())?;
let encoder = SentencePieceProcessor::open("./data/opus-mt-en-jap/source.spm")?;
let decoder = SentencePieceProcessor::open("./data/opus-mt-en-jap/target.spm")?;

let source: Vec<String> = encoder.encode(
    "Hello world! This library provides Rust bindings for CTranslate2.",
)?.iter().map(|v| v.piece.to_string()).collect();

let res = t.translate_batch(&*vec![source], &*vec![vec![""]], &Default::default())?;
for r in res {
    if let Some(h) = r.hypotheses.get(0) {
        println!("{:?}", decoder.decode_pieces(h)?);
    }
}

Note that the model can be converted using the following command:

ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-jap --output_dir data/opus-mt-en-jap

Additionally, ensure that source.spm and target.spm are copied from the repository to the directory data/opus-mt-en-jap.

I am considering providing detailed instructions; however, I encourage you to experiment with the code provided above.

solaoi commented 5 months ago

Thank you, it seems to work with the script you provided.

However, compared to when I run it in Python, it feels like the translation accuracy has degraded for all models based on MarianMT.

Do I need to set parameters like beam_size in TranslationOptions?

jkawamoto commented 5 months ago

I tried using the MarianMT model in Python, but the results were not good either. As you mentioned, the default translation options may not be optimal for this model.

Below is the code I used (taken from CTranslate2's docs):

import ctranslate2
import transformers

translator = ctranslate2.Translator("data/opus-mt-en-jap")
tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-jap")

source = tokenizer.convert_ids_to_tokens(tokenizer.encode("Hello world! This library provides Rust bindings for CTranslate2."))

results = translator.translate_batch([source])
target = results[0].hypotheses[0]

print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))

The output I obtained was:

世界 は 上 を は き た. この 築 い た 荷 は, クミン を 描 く.

Could you share the specific parameters or settings you are using for the MarianMT models?

solaoi commented 5 months ago

I compared the following three MarianMT-based models by varying the beam_size and repetition_penalty, using the provided Python code and ctranslate2-rs:

Helsinki-NLP/opus-mt-en-jap
Hoax0930/marian-finetuned-kde4-en-to-ja_kftt
skata/fugumt-en-ja

Helsinki-NLP/opus-mt-en-jap

Python

"Hello world! This library provides Rust bindings for CTranslate2."

defalut 世界は上をはきた. この築いた荷は, クミンを描く.
beam size: 5 世には陰府をもつ. この築いた荷は, クミンに対して用意をされている.
beam size: 5, repetition_penalty=1.2 世界には希望を下す. この築き建てる荷は, クミンに属するものである.
beam size: 10 世には長いものである. この築いた所は, クプロを描き, 帆を造るものである.
beam size: 10, repetition_penalty=1.2 世界には希望を下す. この築き建てる荷は, クミンに属するものである.
beam size: 15 世界をおおうものである. この築いた荷は, クミンにとって, 帆をつくるものである.
beam size: 15, repetition_penalty=1.2 世界をおおうものである. この築いた荷は, クミンに対してくいものである.

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut その勢夫マケナテは去ってザアといい, クプラびとボアカデのひとりである. 彼は ▁shal を避けている者であり, 威厳がある. あなたはいと高き者の最初の年に, 警戒の年にあって, 受くことができる.
beam size: 5 その勢夫マケナのように行き, ボヅパという人の姿が知られている. 彼はスカルの人々であり, 岩を打つ. これはいと高き者の最初の年であり, 警戒という者の証言とである.
beam size: 5, repetition_penalty=1.2 その頭を打つことは細かく, クプラの女のひとりであって, 人に属する男やぎ, 岩だぬきを知る者がある. これはいと高き者の最初の年であり, 警戒の初めであって, その堅く立つ.
beam size: 10 その足の響きを知るように, 寄って行きなさい. しゃこには姿があり, 威厳があり, 岩の頂は岩である. これはいと高き者の最初の年であり, 警戒の初めであって, その堅く立つことができる.
beam size: 10, repetition_penalty=1.2 その足の響きを知るように, 寄って行きなさい. しゃこには姿があり, 威厳があり, 岩の人々である. これはいと高き者の最初の年であり, 警戒の初めであって, その堅く立つことができる.
beam size: 15 その頂を過ぎることは, ボヅパテにも知られる. 人に属する者, 寄るべなき者のひとりである. しえたげる者, 岩だからである. これはいと高き者の最初の年であり, 警戒という者の最初の年である.
beam size: 15, repetition_penalty=1.2 その足の響きを知るように, 寄って行きなさい. しゃこには姿があり, 威厳があり, 岩の人々である. これはいと高き者の最初の年であり, 警戒の初めであって, その堅く立つことができる.

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

defalut ⁇ . この築きは,ものマンナをくもの,クミンにくいくものであって,それにアスファルトをつくり,
beam size: 5 ⁇ . この世界はめいくもの,クミンにくいもの,クミンにくいもの備えて置くものがそれである.
beam size: 5, repetition_penalty=1.2 ⁇ . この築きはめるべくもの,クミンにくいものがあり, 道具で備えられ,
beam size: 10 ⁇ . この築きはめるべくもの,クミンにくいれるもの,クミンにくいものがそれであるが備えられる.
beam size: 10, repetition_penalty=1.2 ⁇ . この築きはめるべくもの,クミンがそこに置くものであって,それにアスファルトんだものである.
beam size: 15 ⁇ . この築きはめるべくもの,クミンにくいれるもの,クミンにくいものがそれであるが備えられる.
beam size: 15, repetition_penalty=1.2 ⁇ . この築きはめるべくもの,クミンにくい入りがあり, 道具で備えられ,

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut ⁇ ナのように行き , ボヅパテには姿があり , 人には姿がある . 岩 , ボケル・アビムの年には , 威勢深い者がある . これはいと高き者の最初の年であり , 警戒の初めであって , その堅く立つ .
beam size: 5 ⁇ ナのように行き , ボヅパテには姿があり , 人には姿がある . 岩には岩があり , 威厳がある . これはいと高き者の最初の年であり , つりとげの初めである .
beam size: 5, repetition_penalty=1.2 ⁇ ナマイのように , そこに行き , ボヅパという人のうちのベネケルは知られている . 彼は ▁shal を打つ者 , 岩を失う者である . これはいと高き者の最初の年であり , 警戒の初めであって , その堅く立つ .
beam size: 10 ⁇ エルのように行って威厳があり , ボヅパの男 , ボヅパの男で , 人の姿がある . あなたはこの岩を静める . これはいと高き者の最初の年であり , つりとげのある .
beam size: 10, repetition_penalty=1.2 ⁇ ナマイのように , そこに行き , ボヅパという人のうちのベネケルは知られている . 彼は ▁shal を打つ者 , 岩を失う者である . これはいと高き者の最初の年であり , 警戒の初めであって , その堅く立つ .
beam size: 15 ⁇ エルのように行って驚き , さとげのように人 , ボヅパの人である . あなたは言う , \" メケル・アビム , スケル \" という . この事はいと高き者の最初の年であり , つやさとげの年である .
beam size: 15, repetition_penalty=1.2 ⁇ ナマイのように , そこに行き , ボヅパという人のうちのベネケルは知られている . 彼は ▁shal を打つ者 , 岩を失う者である . これはいと高き者の最初の年であり , 警戒の初めであって , その堅く立つ .

Hoax0930/marian-finetuned-kde4-en-to-ja_kftt

Python

"Hello world! This library provides Rust bindings for CTranslate2."

defalut Hello World! このライブラリは CTranstelt2 のラプターバインドを提供します。
beam size: 5 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。
beam size: 5, repetition_penalty=1.2 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。
beam size: 10 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。
beam size: 10, repetition_penalty=1.2 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。
beam size: 15 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。
beam size: 15, repetition_penalty=1.2 Hello World! このライブラリは CTranslate2にラプターバインドを提供します。

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut azerbaijan@kde.gr.jp,support@pluto.dti.ne.jp,support@pluto.dti.ne.jp,sgtom@pluto.dti.ne.jp,support@pluto.dti.ne.jp,sgtom@pluto.dti.ne.jp,sgtom@pluto.dion.ne.jp
beam size: 5 azerbaijan@kde.gr.jp,support@pluto.dti.ne.jp,support@pluto.dti.ne.jp,ybando@k6.dion.ne.jp,ybando@k6.dion.ne.jp
beam size: 5, repetition_penalty=1.2 azerbaijan@kde.gr.jp,shinobo@leo.bekkoame.ne.jp,tsuno@ngy.1st.ne.jp
beam size: 10 azerbaijan@kde.gr.jp,ybando@k6.dion.ne.jp,ybando@k6.dion.ne.jp,ybando@k6.dion.ne.jpEMAIL OF TRANSLATORS
beam size: 10, repetition_penalty=1.2 azerbaijan@kde.gr.jp,shinobo@leo.bekkoame.ne.jp,tsuno@ngy.1st.ne.jp
beam size: 15 City name (optional, probably does not need a translation). http://europa.eu.int/europa.int/eur-lex/lex/LexUriServ/LexUriServ.do?uri=CELEX:32001L0059:EN:HTML
beam size: 15, repetition_penalty=1.2 azerbaijan@kde.gr.jp,shinobo@leo.bekkoame.ne.jp,tsuno@ngy.1st.ne.jp

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

defalut ⁇ Hello Woro! このライブラリは CTCrrassrate2.CTCTranslate2.CTClasslat のスルバフメントを提供します。
beam size: 5 ⁇ Hello Woro! このライブラリは CTCrrassrate2.CTCTransslate2.CTClasslatra2.sslバインドを提供します。
beam size: 5, repetition_penalty=1.2 ⁇ Hello Work! このライブラリは CTCrransslate2.CTclassrat2.CTRAranslad のスラップバインドを提供します。
beam size: 10 ⁇ Hello Woro! このライブラリは CTCrrassrate2.CTCTranslate2.CTClasslat2.sslバインドを提供します。
beam size: 10, repetition_penalty=1.2 ⁇ Hello Work! このライブラリは CTCrasslatrad 2.CTcranslate2.
beam size: 15 ⁇ Hello Woro! このライブラリは CTCrrassrate2.CTCTranslate2.CTClasslat のスラップバインドを提供します。
beam size: 15, repetition_penalty=1.2 ⁇ Hello Work! このライブラリは CTCrasslatrad 2.CTcranslate2.

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut ⁇ 尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚一一の始に在りししに属して大のギターのギターとの音楽を担当との音楽を担当とい.
beam size: 5 ⁇ 尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚尚一一年に在りて,Kesokuroboard.Kosoroboard.Kosoroboard
beam size: 5, repetition_penalty=1.2 ⁇ 南ゴホハは、またボクチチャンとして知られていますが、マンガやアニメシリーズの中の主役の一人です。Kuiki Kochhiro Okul!
beam size: 10 ⁇ 南ゴホハは、またボクチチャンとして知られていますが、マンガやアニメシリーズ、ボクチ・ロックの主要キャラの1人です。Schka高等大学1年目になり、バンドのギターと歌詞を担当しています。KesokuRand.KesokuRand.KosoRand.
beam size: 10, repetition_penalty=1.2 ⁇ 南ゴホハは、またボクチチャンとして知られていますが、マンガやアニメシリーズの中の主役の一人です。Kuoki Kusoroboard.
beam size: 15 ⁇ 南ゴホハは、またボクチチャンとして知られていますが、マンガやアニメシリーズ、ボクチ・ロックの主要キャラの1つで、Kochichi Kochi Kochi Kochi Hochi-Goh はシュカ高等学1年目になり、バンドのギターと歌詞を担当しています。KesokuRand.KesokuRand.
beam size: 15, repetition_penalty=1.2 ⁇ 南ゴホハは、またボクチチャンとして知られていますが、マンガやアニメシリーズの中の主役の一人です。Kuoki Kusoroboard. http://echika.kdeia.jp/artichi.ne.jpEMAIL OF TRANSLATORS

staka/fugumt-en-ja

Python

"Hello world! This library provides Rust bindings for CTranslate2."

defalut このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 5 このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 5, repetition_penalty=1.2 このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 10 このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 10, repetition_penalty=1.2 このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 15 このライブラリはCTranslate2にRustバインディングを提供します。
beam size: 15, repetition_penalty=1.2 このライブラリはCTranslate2にRustバインディングを提供します。

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut 漫画・アニメシリーズ「ボッキ・ザ・ロック!」の主人公のひとりで、修歌高校1年生で、バンド「けっそくバンド」のギターと歌詞を担当。
beam size: 5 漫画・アニメシリーズ「Bocchi the Rock!」の主人公のひとりで、修歌高校1年生の時、バンド「けっそくバンド」のギターと歌詞を担当。
beam size: 5, repetition_penalty=1.2 漫画・アニメシリーズ「ボッキザロック!」の主人公の一人である「ボッチちゃん」は、修歌高校1年生でギターと歌詞を担当している。
beam size: 10 漫画・アニメシリーズ「Bocchi the Rock!」の主人公のひとりで、修歌高校1年生の時、バンド「けっそくバンド」のギターと歌詞を担当。
beam size: 10, repetition_penalty=1.2 漫画・アニメシリーズ『ボッキザロック!』の主人公である「ボッチちゃん」は、修歌高校1年生でギターと歌詞を担当している。
beam size: 15 漫画・アニメシリーズ「Bocchi the Rock!」の主人公のひとりで、修歌高校1年生の時、バンド「けっそくバンド」のギターと歌詞を担当。
beam size: 15, repetition_penalty=1.2 漫画・アニメシリーズ「ボッキザロック!」の主人公の一人である「ボッチちゃん」は、修歌高校1年生でギターと歌詞を担当している。

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

defalut ⁇ 、こんにちは、世界!こんにちは、こんにちは、世界!このライブラリは、このライブラリがCTranslate2.....にCTranslate2.....にCTranslate2.....にCTranslate2....にRustバインディングを提供するCTranslate2..にRustバインディングを提供するRust bindings for CTranslate2 for CTranslate2.にRust Rustを提供するライブラリです。このライブラリは、CTranslate2.
beam size: 5 ⁇ 、こんにちは、世界!こんにちは、こんにちは、世界!このライブラリは、このライブラリは、CTranslate2...にCTranslate2...にCTranslate2.....にCTranslate2...にCTranslate2..にRustバインディングのRust バインディングを提供している。このライブラリは、このライブラリが提供する。このライブラリは、CTranslate2にRustバインディングを提供する。
beam size: 5, repetition_penalty=1.2 ⁇ こんにちは、世界!このライブラリが提供するRustバインディング for CTranslate 2.2.
beam size: 10 ⁇ このライブラリライブラリはCTranslate2...にCTranslate2....にCTranslate2...にRustバインディングを提供する。このライブラリはCTranslate2..にRustバインディングを提供する。このライブラリはCTranslate2..にRustバインディングを提供する。
beam size: 10, repetition_penalty=1.2 ⁇ このライブラリは、CTranslate 2.2.
beam size: 15 ⁇ このライブラリライブラリはCTranslate2...にCTranslate2....にCTranslate2...にRustバインディングを提供する。このライブラリはCTranslate2..にRustバインディングを提供する。このライブラリはCTranslate2..にRustバインディングを提供する。
beam size: 15, repetition_penalty=1.2 ⁇ このライブラリは、CTranslate 2.2.

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

defalut ⁇ ちちゃんとしても知られるボッチちゃんとしても知られるボッチチちゃんとしても知られるボッチちゃんのキャラクターの1人で、漫画・アニメシリーズの主人公の1人である漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッチ・ザ・ロック!ボッチ・ザ・ロック!」の主人公の1人である。彼女は修化高校1年生の修化高校1年生で、シューカ高校の1年生で、バンド「けそくバンド」のギターと歌詞とバンドのギターを担当している。
beam size: 5 ⁇ ちちゃんとしても知られるボッチチちゃんとしても知られるボッチチちゃんとしても知られるボッチチ・ヒトリ・ヒトリ・ヒトリ・ヒトリ・ヒトリ・ゴトー(ボッチチ・ヒトリ・ヒトリ・ヒトリ・ゴトー)は、漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッチ・ボッチ・ザ・ロック!」の主人公の1人であり、漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッキ・ザ・ボッチ・ザ・ロック!」の主人公の主人公の1人である。
beam size: 5, repetition_penalty=1.2 ⁇ 知ちゃんとしても知られるボッチチチャン(ボッキちちゃ)として知られる「ボッキー・ヒトリひとり日鳥ごとう」は、漫画やアニメシリーズ『Bocchi the Rock!』の主人公の一人でマンガンおよびアニメ連作アニメーション系の主要キャラクターの一つである。彼女は修加高校初年度に就き、シュカ高学園1年生でありバンドケソクバントのギター&歌詞担当、キーソック帯けそく Band.のギタリング/歌詞を担当している他、シューッカハイスクールのギター兼首高校一年に入っていて、バンドK雪石石石楽帯域をバンドする血 ⁇ 組気 ⁇ グループであるバンド、「けそくうか束」「けっ独バンド」(けそくば...
beam size: 10 ⁇ 知ちゃんとしても知られるボッチちゃんとしても知られるボッチチちゃんとしても知られるボッチチ・ヒトリ・ヒトリ・ヒトリ・ヒトリ・ヒトリ・ゴトー(ボッチチ・ヒトリ・ヒトリ・ヒトリ・ゴトー)は、漫画・アニメシリーズ『ボッチ・ザ・ロック!ボッチ・ボッチ・ザ・ロック!ボッチ・ザ・ロック!』の主人公の1人である漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッチ・ボッチ・ザ・ロック!ボッチ・ザ・ロック!」の主人公で、漫画・アニメシリーズ「ボッチ・ザロック!ボッチ・ボッチ・ザ・ロック!
beam size: 10, repetition_penalty=1.2 ⁇ 知ちゃんとしても知られるボッチチチャン(ボッキちちゃ)として知られる「ボッキー・ヒトリひとり日鳥ごとう」は、漫画やアニメシリーズの主人公である『Bocchi the Rock!』の主要キャラクター1人です。彼女はシュカ高校初年で修化高学年の1年生でありバンドケソクバントのギター&歌詞を担当している。
beam size: 15 ⁇ 知ちゃんとしても知られるボッチちゃんとしても知られるボッチチちゃんとしても知られるボッチチ・ヒトリ・ヒトリ・ヒトリ・ヒトリ・ゴトー(ボッチチ・ヒトリ・ヒトリ・ヒトリ・ゴトー)は、漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッチ・ザ・ロック!ボッチ・ザ・ロック!」の主人公の1人である漫画・アニメシリーズ「ボッチ・ザ・ザ・ロック!ボッチ・ザ・ボッチ・ザ・ロック!」の主人公の主人公の1人である漫画・アニメシリーズの主人公で、漫画・アニメシリーズ「ボッチ・ザ・ロック!ボッチ・ボッチ・ザ・ロック!
beam size: 15, repetition_penalty=1.2 ⁇ 知ちゃんとしても知られるボッチ・ヒトリひとり日鳥ゴトー(ボッチチャン)の1人で、漫画やアニメシリーズ「Bocchi the Rock!BOCCHI THEロック!」の主要キャラクターの一人である。シュカ高校初年度に就き、酒華高等学校の1年生でありバンドケソクバントのギター&歌詞を担当しているKecsoku Band.をギターおよび作詞担当する。

jkawamoto commented 5 months ago

Thank you for sharing the data. I also tested the models you mentioned and discovered that the sentencepiece tokenizer does not append </s> to the end of the token list. When I manually add </s>, the output no longer contains repetitive text. Additionally, I have implemented a modification that omits the target prefix if the model does not require it (#40). With these updates, the outputs are more similar to those produced by Python.

Here is my newest code, btw:

use sentencepiece::SentencePieceProcessor;

use ct2rs::config::Config;
use ct2rs::translator::Translator;

let t = Translator::new("./data/opus-mt-en-jap", Config::default())?;
let encoder = SentencePieceProcessor::open("./data/opus-mt-en-jap/source.spm")?;
let decoder = SentencePieceProcessor::open("./data/opus-mt-en-jap/target.spm")?;

let mut source: Vec<String> = encoder.encode(
    "Hello world! This library provides Rust bindings for CTranslate2.",
)?.iter().map(|v| v.piece.to_string()).collect();
source.push("</s>".to_string());

let res = t.translate_batch(vec![source], &Default::default())?;
for r in res {
    if let Some(h) = r.hypotheses.get(0) {
        println!("{:?}", decoder.decode_pieces(h)?);
    }
}

solaoi commented 5 months ago

Thank you! Everything worked perfectly with the main branch. The code you provided had just one typo below, but everything else was perfect!

I really appreciate your prompt response. I'm looking forward to future updates!

- let res = t.translate_batch(vec![source], &Default::default())?;
+ let res = t.translate_batch(&vec![source], &Default::default())?;

solaoi commented 4 months ago

@jkawamoto Thank you for adding the sample document below. https://github.com/jkawamoto/ctranslate2-rs/blob/main/examples/marian-mt.rs

I tried using version 0.7.3, but it seems that Translator::with_tokenizer does not exist.

It works when I use new as shown below. Is there something wrong with my implementation?

use ct2rs::config::Config;
use ct2rs::sentencepiece::Tokenizer;
use ct2rs::TranslationOptions;
use ct2rs::Translator;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let text = std::env::args()
        .nth(1)
        .unwrap_or("Hello world! This library provides Rust bindings for CTranslate2.".to_string());
    let model_path = "./mymodel";
    let t = Translator::new(
        &model_path,
        Tokenizer::new(&model_path)?,
        &Config::default(),
    )?;
    let sources: Vec<String> = text.lines().map(String::from).collect();

    let res = t.translate_batch(
        &sources,
        &TranslationOptions {
            beam_size: 5,
            ..Default::default()
        },
    )?;
    for (r, _) in res {
        print!("{}", r);
    }
    Ok(())
}

jkawamoto commented 4 months ago

Your code looks good with v0.7.3.

Please refer to the example at v0.7.3 instead of the one on the main branch. The main branch is currently aimed at v0.8.0, which includes some breaking changes.

solaoi commented 4 months ago

Thank you for your quick response! I’ll check the v0.7.3 example as suggested.

jkawamoto / ctranslate2-rs

Add support for MarianMT Model #38

Helsinki-NLP/opus-mt-en-jap

Python

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

Hoax0930/marian-finetuned-kde4-en-to-ja_kftt

Python

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

staka/fugumt-en-ja

Python

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."

ctranslate2-rs

"Hello world! This library provides Rust bindings for CTranslate2."

"Hitori Gotoh, also known as Bocchi-chan, is one of the main characters in the manga and anime series, Bocchi the Rock!. She is in the first year of Shuka High School and is in charge of the guitar and lyrics of the band, Kessoku Band."