pragmatrix / tnt

Command line tool for organizing translation strings extracted from .NET assemblies. Supports Excel, XLIFF roundtrips and machine translations.
MIT License
30 stars 2 forks source link

Too many text segments error #90

Closed theolivenbaum closed 4 years ago

theolivenbaum commented 4 years ago

Hi @pragmatrix (and thansk for the awesome tool!)

Just to let you know, I'm hitting the following error after adding TNT to one of our projects, and trying to use the tnt translate command:

Google.Apis.Requests.RequestError
Too many text segments [400]
Errors [
        Message[Too many text segments] Location[ - ] Reason[invalid] Domain[global]
]

Which seems to be due to a maximum of 128 text segments that can be sent at the same time to the API call on https://translation.googleapis.com/language/translate/v2.

Is it possible to change the implementation to batch calls with a few segments at a time?

theolivenbaum commented 4 years ago

@pragmatrix managed to get my head around the FSharp code (always an interesting cognitive experience switching between C#/F# 😅 ), tested locally and it works fine with this change!

pragmatrix commented 4 years ago

Thank you for using TNT and thank you for the PR.

I was able to reproduce the problem but then tried to go another route and ported the code the Google Cloud Translation API V3, which does seem to impose these strict limits so far.

Can you take a look if #94 works for you. If so, I'd like to merge my PR and if the limits reappear use the batching support that is available in the V3 API.

pragmatrix commented 4 years ago

I've found a regression in the V3 API, it seems to remove newlines in the resulting translations. As long this is not resolved, we can't use it.

pragmatrix commented 4 years ago

Default format was HTML, setting the MimeType to text/plain fixes that.

pragmatrix commented 4 years ago

Found something concrete about the limits:

From https://cloud.google.com/translate/quotas

The Cloud Translation API is optimized for translating of smaller requests. The recommended maximum length for each request is 5K characters (code points). The maximum number of code points for a single request is 30K. However, the more characters that you include, the higher the response latency. The Cloud Translation API rejects requests larger than the maximum and gives a 400 INVALID_ARGUMENT error regardless of the available quota.

theolivenbaum commented 4 years ago

Hi @pragmatrix

I think it is probably good to keep batching it anyway, as I doubt there is any significant increase in runtime to translate (on our code-base for 8 languages it runs in under a few seconds) - and price-wise I think the API charges per character so it is irrelevant if one or multiple calls.

I've to say it feels like magic to get our UI translated in less than a day of work 😂!

Btw I took the liberty to implement something similar to the TNT.T nuget on our UI front-end library,, because of the restrictions on using normal NuGet packages on the JS to C# compiler we use.

Cheers,

Rafael

pragmatrix commented 4 years ago

I think it is probably good to keep batching it anyway

Yes, indeed, see #95 for what I came up with. It's a bit convoluted, but should comply to Google's recommended 5000 code points and also allows for partial translations. I did some testing and also simulated a few error situations and so far it seems to work as expected.