yandexdataschool / nlp_course

YSDA course in Natural Language Processing
https://lena-voita.github.io/nlp_course.html
MIT License
9.79k stars 2.59k forks source link

fix: Added additional assert on week01 seminar data_vectors to avoid NaNs #137

Closed bogdansalyp closed 10 months ago

bogdansalyp commented 10 months ago

In week01 there are two lines in gibberish in the data:

If not replaced with zeros (as the function description explicitly requires), it may cause NaN when averaging in get_phrase_embedding (division by zero). However, that comment is easily missed and is omitted in the seminar video recording. That NaN backfires only in find_nearest in a dot product operation.

I'd suggest to add additional assert before find_nearest which saves students' time and hints that data_vectors were composed with errors.

review-notebook-app[bot] commented 10 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

bogdansalyp commented 10 months ago

replaced by https://github.com/yandexdataschool/nlp_course/pull/138