winstxnhdw / nllb-api

A performant high-throughput CPU-based API for Meta's No Language Left Behind (NLLB) using CTranslate2, hosted on Hugging Face Spaces.
https://huggingface.co/spaces/winstxnhdw/nllb-api
82 stars 13 forks source link

Translation truncated below 1024 tokens #190

Closed rahul08946 closed 1 month ago

rahul08946 commented 2 months ago

I am using the Self hosting option from your most recent version for the docker image with following config:

    image: 'ghcr.io/winstxnhdw/nllb-api:main'
    container_name: 'nllb'
    ports:
      - '7860:7860'
    environment:
      - APP_PORT=7860 
      - OMP_NUM_THREADS=4
      - WORKER_COUNT=2
      - CT2_USE_EXPERIMENTAL_PACKED_GEMM=1
      - CT2_FORCE_CPU_ISA=AVX2

Ive tested it and it works as expected but only generates a full response for inputs around 300 tokens or something near that.
As far as Ive read, it should be able to handle text inputs up to 1024 tokens but the translated text is always incomplete, missing out on information or stops mid sentence.

I have seen in the other issues that setting the MAX_INPUT_LENGTH env variable should help but should only be the case for more than 1024 tokens. Ive still tried it but with no difference in result. What am I missing?

winstxnhdw commented 2 months ago

huh, that’s strange. can you check if you can replicate this issue with the original model? there’s a bunch of spaces that host the original model.

rahul08946 commented 2 months ago

Seems like it, yes. Just to be sure i am not doing something wrong: Isnt this how you would check the amount of tokens in the text? (using transformers)

tokenizer = NllbTokenizer.from_pretrained("winstxnhdw/nllb-200-distilled-1.3B-ct2-int8")
print(len(tokenizer.tokenize(text)))
winstxnhdw commented 2 months ago

Not sure, but I'd do it like this.

tokeniser = NllbTokenizerFast.from_pretrained("winstxnhdw/nllb-200-distilled-1.3B-ct2-int8")
len(tokeniser(text).tokens())
rahul08946 commented 2 months ago

Still same amount

winstxnhdw commented 2 months ago

Can you send me your input and I'll test it out on my end.

rahul08946 commented 2 months ago

this bigger text misses out on information "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me. There now is your insular city of the Manhattoes, belted round by wharves as Indian isles by coral reefs—commerce surrounds it with her surf. Right and left, the streets take you waterward. Its extreme downtown is the battery, where that noble mole is washed by waves, and cooled by breezes, which a few hours previous were out of sight of land. Look at the crowds of water-gazers there. Circumambulate the city of a dreamy Sabbath afternoon. Go from Corlears Hook to Coenties Slip, and from thence, by Whitehall, northward. What do you see?—Posted like silent sentinels all around the town, stand thousands upon thousands of mortal men fixed in ocean reveries. Some leaning against the spiles; some seated upon the pier-heads; some looking over the bulwarks of ships from China; some high aloft in the rigging, as if striving to get a still better seaward peep. But these are all landsmen; of week days pent up in lath and plaster—tied to counters, nailed to benches, clinched to desks. How then is this? Are the green fields gone? What do they here?" image

while the smaller one works expectedly: Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. image

winstxnhdw commented 2 months ago

Can you test it here too? https://huggingface.co/spaces/Geonmo/nllb-translation-demo

If you are also getting a problem here, it means that there is an issue with the original model. In that case, it might be better if you split up your text into smaller chunks for processing instead.

rahul08946 commented 2 months ago

Yeah same problem there. Ive read somewhere about a max_length attribute which is supposedly for limiting the output tokens https://huggingface.co/docs/transformers/en/model_doc/nllb#transformers.NllbTokenizer

I've seen that in ctranslate2 the equivalent is max_decoding_length

rahul08946 commented 1 month ago

Hello?

winstxnhdw commented 1 month ago

Well, have you tried with those options? I don’t have an available machine to test right now.

winstxnhdw commented 1 month ago

Yeah, I tried setting max_decoding_length to 4096 and there's no difference. Closing this as a model issue.