reynoldsnlp / udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
GNU General Public License v3.0
26 stars 1 forks source link

don’t attempt to re-download punkt tokenizer #50

Closed NSBum closed 2 years ago

NSBum commented 3 years ago

Previously an attempt was made to redownload punkt with each request for the in-use tokenizer, causing nltk to emit warnings about punkt already being downloaded. This change attempts to find punkt and if not present to then download it.

codecov-commenter commented 3 years ago

Codecov Report

Merging #50 (20ced69) into main (75927ae) will decrease coverage by 0.10%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #50      +/-   ##
==========================================
- Coverage   78.78%   78.68%   -0.11%     
==========================================
  Files          33       33              
  Lines        2659     2660       +1     
==========================================
- Hits         2095     2093       -2     
- Misses        564      567       +3     
Impacted Files Coverage Δ
src/udar/sentence.py 74.30% <0.00%> (-0.63%) :arrow_down:
src/udar/features/feature_extractor.py 91.42% <0.00%> (-0.13%) :arrow_down:
src/udar/features/features.py 98.59% <0.00%> (-0.02%) :arrow_down:
src/udar/fsts.py 96.00% <0.00%> (ø)
src/udar/convenience.py 97.18% <0.00%> (ø)
src/udar/conversion/OC_conflicts.py 100.00% <0.00%> (ø)
src/udar/conversion/UD_conflicts.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 75927ae...20ced69. Read the comment docs.