Open EmanuelFaria opened 1 year ago
Update: upon running "en_core_sci_scibert" using:
docanalysis --project_name phytomed200 --make_section --spacy_model en_core_sci_scibert --entities ALL --output en_core_sci_scibert.csv
It seems to have run without printing Errors in red, and it didn't output the csv, but here's the final lines of output:
INFO: Found 35044 sentences in the section(s).
INFO: Loading en_core_sci_scibert
0%| | 0/35044 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/docanalysis", line 8, in <module>
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 196, in main
calldocanalysis.handlecli()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 188, in handlecli
self.entity_extraction.extract_entities_from_papers(args.project_name, args.dictionary, search_sections=args.search_section, entities=args.entities, query=args.query, hits=args.hits,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 165, in extract_entities_from_papers
self.run_spacy_over_sections(self.sentence_dictionary, entities)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 362, in run_spacy_over_sections
doc = self.nlp(dict_with_parsed_xml[paragraph]['sentence'])
TypeError: 'NoneType' object is not callable
Also, I'm not sure what, if anything, it did to the sections that already existed because I had previously run:
docanalysis --project_name phytomed200 --make_section --spacy_model spacy --entities ALL --output entities.csv
Is it possible to run these two commands, one after the other? If not, that should be noted in the instructions because I would have duplicated my corpus before running any sectioning.
Please let me know
Thanks
Running the two commands you have pasted one after the other shouldn't be a problem.
We haven't tested scipacy
in long. It may be that it found no entities in the sentences. I will have to look into your corpus to understand better.
@mannyrules @ShweataNHegde I have also updated docanalysis with new requirements and pushed it as a new version because of conflicts as manny said.
Awesome!!! Thanks @ayush4921 !
I'll try some of the other scispacy models too, and let you know if there's any conflicts. I remember during the hackathon, at least one of the attendees also asked about the biomedicine models for scispacy, so it's not just me. :)
@ayush4921 I ran pip install docanalysis
to upgrade (is this the correct command?) and got this:
Successfully uninstalled spacy-3.4.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy-transformers 1.1.9 requires spacy<4.0.0,>=3.4.0, but you have spacy 3.0.7 which is incompatible.
en-core-sci-lg 0.5.1 requires spacy<3.5.0,>=3.4.1, but you have spacy 3.0.7 which is incompatible.
I then ran pip install spacy
and then ran pip install docanalysis
and then got no errors
@ayush4921 by the way, the version in the help menu says Welcome to docanalysis version 0.2.0. -h or --help for help
but on github mainpage it says "Publication release v0.1.9 [Latest]"
Is it possible to add a prompt like "You have version x of docanalysis installed. The latest version is x.1" would you like to update now? [Y/N]"? If so, what would be a good addition for all the programs
Its definitely not common in the python world if I am not wrong since I haven't seen it before but seems like a neat addition. I can look into that.
@ayush4921 by the way, the version in the help menu says
Welcome to docanalysis version 0.2.0. -h or --help for help
but on github mainpage it says "Publication release v0.1.9 [Latest]"
Fixed
Ok, I'm trying out installing the other scispacy models on this page: https://allenai.github.io/scispacy/
the first one is en_core_sci_sm
I ran pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz
and got this error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
scispacy 0.4.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible.
en-core-web-sm 3.0.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible.
docanalysis 0.2.0 requires spacy==3.0.7, but you have spacy 3.4.4 which is incompatible.
next is en_core_sci_md
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
and got this result: no error
next is en_core_sci_scibert
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz
and got this result: no error
next is en_core_sci_lg
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz
and got this result: no error
next is en_ner_craft_md
I ran: https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_craft_md-0.5.1.tar.gz
and got this result: no error
next is en_ner_jnlpba_md
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_jnlpba_md-0.5.1.tar.gz
and got this result: no error
next is en_ner_bc5cdr_md
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz
and got this result: no error
LAST is en_ner_bionlp13cg_md
I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz
and got this result: no error
I installed the
en_core_sci_scibert
model from https://allenai.github.io/scispacy/ using the following command:pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz
and got the following errors: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. scispacy 0.4.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible. en-core-web-sm 3.0.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible. docanalysis 0.2.0 requires spacy==3.0.7, but you have spacy 3.4.4 which is incompatible.
I then installed the
en_core_sci_scibert
module using: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gzand got no errors. Yipee!!