Open linlinloo opened 9 months ago
That's not an error, though. It should work just fine with that warning
On Sun, Jan 21, 2024, 11:41 PM linlinloo @.***> wrote:
I want to run the following code, but an error occurred.
import stanza pipe = stanza.Pipeline("en", processors="tokenize,ner", package={"ner": ["ncbi_disease", "ontonotes"]}) doc = pipe("John Bauer works at Stanford and has hip arthritis. He works for Chris Manning") print(doc.ents)
WARNING: Language en package default expects mwt, which has been added
I have downloaded ncbi_disease.pt and placed it in site-packages\stanza\stanza_resources\en\ner What's the problem?and why?
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOCQ57BIGJKMEGETMLYPYJ2ZAVCNFSM6AAAAABCEXKUXCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TGMRYGIYDONY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
If it's giving a timeout error, I would guess the most likely culprit is it's trying to download missing resources and isn't able to connect. You can add download_method=None to the Pipeline to stop it from downloading
On Mon, Jan 22, 2024 at 12:13 AM linlinloo @.***> wrote:
However, the operation did not yield any results, and a series of errors would appear: ConnectTimeout, MaxRetryError...... When I run other code, there is no ncbi_disease in ner. Is it the wrong package I have put? Loading these models for language: en (English): | Processor | Package | | tokenize | combined | | mwt | combined | | pos | combined_charlm | | lemma | combined_nocharlm | | constituency | ptb3-revised_charlm | | depparse | combined_charlm | | sentiment | sstplus | | ner | ontonotes-ww-multi_charlm |
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1334#issuecomment-1903459458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWJBCZJTBAHSZ67PITLYPYNRHAVCNFSM6AAAAABCEXKUXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBTGQ2TSNBVHA . You are receiving this because you commented.Message ID: @.***>
Also, I should note that for version 1.7.0, the default NER model is now
"ontonotes-ww-multi_charlm"
there's also
"ontonotes_charlm"
They are named this way so that you can get "nocharlm" models if you want faster processing. If there's some stale documentation, please let me know and I'll update it.
I find ontonotes_charlm.pt, and I can download it, do you meant that I should replace ontonotes-ww-multi_charlm? And sorry, how to add download_method=None. Like this? pipe = stanza.Pipeline("en", download_method=None )
I find ontonotes_charlm.pt, and I can download it, do you meant that I should replace ontonotes-ww-multi_charlm?
You can do whatever you like, of course. The ww-multi model was trained on both OntoNotes and the dataset described in this paper
And sorry, how to add download_method=None. Like this? pipe = stanza.Pipeline("en", download_method=None )
Yes, exactly. I suggest that because it's the most likely reason you're getting timeouts. If the problem is somewhere else, please include the complete stack trace.
I want to run the following code, but an error occurred.
import stanza pipe = stanza.Pipeline("en", processors="tokenize,ner", package={"ner": ["ncbi_disease", "ontonotes"]}) doc = pipe("John Bauer works at Stanford and has hip arthritis. He works for Chris Manning") print(doc.ents)
WARNING: Language en package default expects mwt, which has been added
I have downloaded ncbi_disease.pt and placed it in site-packages\stanza\stanza_resources\en\ner What's the problem?and why?