Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true

xhluca / dl-translate

Library for translating between 200 languages. Built on 🤗 transformers.

https://xhluca.github.io/dl-translate/

MIT License

451 stars 47 forks source link

Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) #27

Closed Kouuh closed 3 years ago

Kouuh commented 3 years ago

When I use dl_translate, the following problem appears, how do I set TOKENIZERS_PARALLELISM.

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

xhluca commented 3 years ago

Thanks for reporting this! I'm not sure I understand the context; could you share some code (that can be run preferably on Colab) that reproduces the problem? And have you tried setting the environment variable in bash:

export TOKENIZERS_PARALLELISM=true

Or inside python:

import os

import dl_translate as dlt

os.environ['TOKENIZERS_PARALLELISM'] = True # or, "False"

...

mt = dlt.TranslationModel()
...

Kouuh commented 3 years ago

Thank you!
When I execute os.environ['TOKENIZERS_PARALLELISM'] = True, it can run.