nidhaloff / deep-translator

A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.
https://deep-translator.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
1.61k stars 186 forks source link

Reducing the number of API calls on large batch translations #198

Open NorthOC opened 1 year ago

NorthOC commented 1 year ago

An API call is made for each item in translate_batch, even if the same item is repeated a hundred of times. This is slow. That is why my pull request aims to fix that:

  1. When a new batch item is translated, it is stored in a dictionary of already_translated items. The key for the translation in this dictionary is the previously untranslated sentence.
  2. The next item in the batch is then checked against the keys of already_translated items and, if an identical match is found, uses the existing translation.
  3. If there is no identical match, an API call is made for the translation.

Even with hundreds of items in a dictionary, checking against its keys is significantly faster than an API call.

The best part is that the already_translated dictionary will be used for the next big batch because it is initialized with the BaseTranslator class. In short, this means that only unique items will require an API call, while repeated items will be translated swiftly.

example:

translator = LibreTranslator(source='auto', target='es')
batch = ['Hello world', 'I am a python', 'Hello world', 'Hello world', 'I am a python']

translated_batch = translator.translate_batch(batch)
# API call (slow)
# API call (slow)
# Already translated (fast)
# Already translated (fast)
# Already translated (fast)
# Batch translation finished

# translator._already_translated = {
#   'Hello world' : 'Hola mundo',
#   'I am a python' : 'Soy un pitón'
# }
NorthOC commented 1 year ago

@nidhaloff Moved already_translated into the function for batch translations. I ran pytest, seems to pass all checks.