Closed FloweryScythe13 closed 2 years ago
I just ran this on my Linux desktop without the pandas, and it worked fine. Obviously I'll need to try it on Windows without the pandas, but if you would confirm that it fails when you replace docs_list
with something like "Stop making new bugs Luffa"
that would be helpful
Ah, I can recreate it by not having the expected word vectors file for the NER in the expected location. Both on Windows and Linux, and the input format doesn't matter, as expected.
Basically, originally the NER models had their own separate copy of the embeddings in the model itself, whereas the other models all downloaded the embeddings separately. I separated the NER models into pretrained embeddings and everything else, kind of like when Capt. Janeway murdered Tuvix. The benefit was that most languages would have smaller downloads, since now the NER models would be much smaller and reuse the same embeddings as the POS models. This isn't true for English, though, which uses a different embedding for NER and POS/depparse.
Unfortunately, while I set the one language Pipeline to download needed models when they are missing, I apparently didn't do that for the MultilingualPipeline. Also, that embedding isn't included in the default.zip for whatever reason.
Simple fix for now: add this to your script
stanza.download("en", processors="ner")
long term, I'll fix both of those issues above
the .zip
building script is updated to include the extra embeddings when needed, and I pushed those zips to the repo (hoped to do it all sneaky style, but someone tried to download the Russian models in the middle of the download https://github.com/stanfordnlp/stanza-resources/issues/10#issuecomment-1277311428)
https://github.com/stanfordnlp/stanza/commit/435685f875766e0b9b2b9b1d4792db1c452f9722
As for the downloading, I just noticed that you have download_method
set to None. Actually, if you switch that to "download_method": stanza.DownloadMethod.REUSE_RESOURCES}
in your configs, it will download missing pieces without re-downloading the .json every time.
This got me going again. Thank you very much!
Hi,
I am experiencing a blocking issue with some multilingual pipeline code. My code is as follows:
This is the error I am getting:
After looking at several other closed issues referencing the FileNotFoundError exception, I did double-check and rerun
stanza.download("en")
. No effect.The only file present in the above-referenced
\stanza_resources\en\pretrain\
directory iscombined.pt
.Also, as a potentially important note, I first wrote this code back in late June/early July, and the above pipeline code ran successfully at that time (if not quite in the ways I wanted from a multilingual standpoint, but that's another matter). It is only now that I am returning to it (and after creating a new replacement conda environment) that this FileNotFoundError exception is being thrown. Perhaps a change in the last two minor releases is the reason for this exception?
Environment (please complete the following information):
OS: Windows Python version: 3.7.11 Stanza version: 1.4.1 and 1.4.2 (tried both, installed from both Miniconda and Pip).