yandexdataschool / nlp_course

YSDA course in Natural Language Processing
https://lena-voita.github.io/nlp_course.html
MIT License
9.83k stars 2.61k forks source link

Some code is not working due to new versions of libraries #71

Closed Extremesarova closed 2 months ago

Extremesarova commented 3 years ago

Hi! Talking about nlp_course/week01_embeddings/seminar.ipynb: This row "Requirements: pip install --upgrade nltk gensim bokeh , but only if you're running locally." will install the latest versions of libraries, because you didn't specify exact versions. I suggest to specify exact versions of libraries you intended to use in your notebooks. As of May 2021, gensim has version 4.0.1 It means that

words = sorted(model.vocab.keys(), 
               key=lambda word: model.vocab[word].count,
               reverse=True)[:1000]

will not work. Better to replace it with

words = sorted(model.key_to_index.keys(), 
               key=lambda word: model.get_vecattr(word, "count"),
               reverse=True)[:1000]

Talking about nlp_course/week01_embeddings/homework.ipynb:

precision_top1 = precision(uk_ru_test, mapping.predict(X_test), 1)
precision_top5 = precision(uk_ru_test, mapping.predict(X_test), 5)

assert precision_top1 >= 0.635
assert precision_top5 >= 0.813

And here it works only with this fix precision_top5 >= 0.811 (probably due to new gensim library as well)

P.S. I will update this issue with new problems as I go through the course.

poedator commented 2 years ago

1) the outdated call to model.vocab should indeed be updated. See PR here.

2) 0.813 is actually achievable. Keep trying or see the course chats. )