An API call is made for each item in translate_batch, even if the same item is repeated a hundred of times. This is slow. That is why my pull request aims to fix that:
When a new batch item is translated, it is stored in a dictionary of already_translated items. The key for the translation in this dictionary is the previously untranslated sentence.
The next item in the batch is then checked against the keys of already_translated items and, if an identical match is found, uses the existing translation.
If there is no identical match, an API call is made for the translation.
Even with hundreds of items in a dictionary, checking against its keys is significantly faster than an API call.
The best part is that the already_translated dictionary will be used for the next big batch because it is initialized with the BaseTranslator class. In short, this means that only unique items will require an API call, while repeated items will be translated swiftly.
example:
translator = LibreTranslator(source='auto', target='es')
batch = ['Hello world', 'I am a python', 'Hello world', 'Hello world', 'I am a python']
translated_batch = translator.translate_batch(batch)
# API call (slow)
# API call (slow)
# Already translated (fast)
# Already translated (fast)
# Already translated (fast)
# Batch translation finished
# translator._already_translated = {
# 'Hello world' : 'Hola mundo',
# 'I am a python' : 'Soy un pitón'
# }
An API call is made for each item in translate_batch, even if the same item is repeated a hundred of times. This is slow. That is why my pull request aims to fix that:
Even with hundreds of items in a dictionary, checking against its keys is significantly faster than an API call.
The best part is that the already_translated dictionary will be used for the next big batch because it is initialized with the BaseTranslator class. In short, this means that only unique items will require an API call, while repeated items will be translated swiftly.
example: