Closed pacospace closed 3 years ago
/kind feature /priority important-soon
"All three ML framework in the same image? Or different images NLP-pytroch, NLP-tensorflow, NLP-scikit-learn" --> In my experience people have often different python virtual environments for pytorch, tensorflow etc. so it would probably make sense to have these as different images.
"All three ML framework in the same image? Or different images NLP-pytroch, NLP-tensorflow, NLP-scikit-learn" --> In my experience people have often different python virtual environments for pytorch, tensorflow etc. so it would probably make sense to have these as different images.
That was my thought @ViitasaariVille, thanks for answering, we will proceed to create three overlays for nlp
for the three images!
nlp-pytorch
nlp-tensorflow
nlp-scikit-learn
if more combinations are required is not a problem with the architecture we have for the builds.
Hi @ViitasaariVille, for spacy and nltk, do you need some trained language models and data already available in the image?
for example for spacy, include english trained model and for nltk include the different models for chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, etc..
Hi @pacospace, I'm not an expert on NLP stuff but asked my colleague and he thinks this one would be generally useful: https://www.nltk.org/_modules/nltk/tokenize/punkt.html. And NLTK's SnowballStemmer has been useful at least for me in the past when doing tf-idf, text classifications (I've been using SnowballStemmer(language='finnish')), topic analysis etc. I'm guessing small language models for all possible languages would be generally useful as you describe above: "for nltk include the different models for chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, etc..". Then again e.g. BERT models for several languages are just too large to include in an image. We will probably be using https://github.com/TurkuNLP/FinBERT which is a BERT model for Finnish language (not financial BERT :) ) but we'll be uploading these separately into an OCS s3 bucket.
NLP Images created ps-nlp
, basic NLP Image, ps-nlp-pytorch
and ps-nlp-tensorflow
, README: https://github.com/thoth-station/ps-nlp.
In the README, you can find descriptions of packages in each image and you have also links to quay images.
Feel free to test them and please let us know if they match all requirements, otherwise, feel free to open more issues/features in this repo and we will improve them 🙂
Is your feature request related to a problem? Please describe. As Data Scientist working on NLP,
I want to have an image with some specific libraries for my NLP project.
Describe the solution you'd like
nice to have:
Describe alternatives you've considered
Additional context Question: All three ML framework in the same image? Or different images
NLP-pytroch
,NLP-tensorflow
,NLP-scikit-learn
cc @harshad16