petermr / docanalysis

Semantic analysis of text documents including sentence and paragraph splitting
Apache License 2.0
13 stars 3 forks source link

dependency conflicts probhibit use of some other sciSpacy models #28

Open EmanuelFaria opened 1 year ago

EmanuelFaria commented 1 year ago

I installed the en_core_sci_scibert model from https://allenai.github.io/scispacy/ using the following command: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz

and got the following errors: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. scispacy 0.4.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible. en-core-web-sm 3.0.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible. docanalysis 0.2.0 requires spacy==3.0.7, but you have spacy 3.4.4 which is incompatible.


I then installed the en_core_sci_scibert module using: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz

and got no errors. Yipee!!


EmanuelFaria commented 1 year ago

Update: upon running "en_core_sci_scibert" using:

docanalysis --project_name phytomed200 --make_section --spacy_model en_core_sci_scibert --entities ALL --output en_core_sci_scibert.csv

It seems to have run without printing Errors in red, and it didn't output the csv, but here's the final lines of output:


INFO: Found 35044 sentences in the section(s).
INFO: Loading en_core_sci_scibert
  0%|                                                                                                                                                                                       | 0/35044 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/bin/docanalysis", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 196, in main
    calldocanalysis.handlecli()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/docanalysis.py", line 188, in handlecli
    self.entity_extraction.extract_entities_from_papers(args.project_name, args.dictionary, search_sections=args.search_section, entities=args.entities, query=args.query, hits=args.hits,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 165, in extract_entities_from_papers
    self.run_spacy_over_sections(self.sentence_dictionary, entities)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/docanalysis/entity_extraction.py", line 362, in run_spacy_over_sections
    doc = self.nlp(dict_with_parsed_xml[paragraph]['sentence'])
TypeError: 'NoneType' object is not callable

Also, I'm not sure what, if anything, it did to the sections that already existed because I had previously run:

docanalysis --project_name phytomed200 --make_section --spacy_model spacy --entities ALL --output entities.csv

Is it possible to run these two commands, one after the other? If not, that should be noted in the instructions because I would have duplicated my corpus before running any sectioning.

Please let me know

Thanks

ShweataNHegde commented 1 year ago

Running the two commands you have pasted one after the other shouldn't be a problem.

We haven't tested scipacy in long. It may be that it found no entities in the sentences. I will have to look into your corpus to understand better.

ayush4921 commented 1 year ago

@mannyrules @ShweataNHegde I have also updated docanalysis with new requirements and pushed it as a new version because of conflicts as manny said.

EmanuelFaria commented 1 year ago

Awesome!!! Thanks @ayush4921 !

I'll try some of the other scispacy models too, and let you know if there's any conflicts. I remember during the hackathon, at least one of the attendees also asked about the biomedicine models for scispacy, so it's not just me. :)

EmanuelFaria commented 1 year ago

@ayush4921 I ran pip install docanalysis to upgrade (is this the correct command?) and got this:


 Successfully uninstalled spacy-3.4.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy-transformers 1.1.9 requires spacy<4.0.0,>=3.4.0, but you have spacy 3.0.7 which is incompatible.
en-core-sci-lg 0.5.1 requires spacy<3.5.0,>=3.4.1, but you have spacy 3.0.7 which is incompatible.

I then ran pip install spacy and then ran pip install docanalysis and then got no errors

EmanuelFaria commented 1 year ago

@ayush4921 by the way, the version in the help menu says Welcome to docanalysis version 0.2.0. -h or --help for help but on github mainpage it says "Publication release v0.1.9 [Latest]"

Screenshot 2022-12-21 at 1 21 35 PM
EmanuelFaria commented 1 year ago

Is it possible to add a prompt like "You have version x of docanalysis installed. The latest version is x.1" would you like to update now? [Y/N]"? If so, what would be a good addition for all the programs

ayush4921 commented 1 year ago

Its definitely not common in the python world if I am not wrong since I haven't seen it before but seems like a neat addition. I can look into that.

ayush4921 commented 1 year ago

@ayush4921 by the way, the version in the help menu says Welcome to docanalysis version 0.2.0. -h or --help for help but on github mainpage it says "Publication release v0.1.9 [Latest]"

Screenshot 2022-12-21 at 1 21 35 PM

Fixed

EmanuelFaria commented 1 year ago

Ok, I'm trying out installing the other scispacy models on this page: https://allenai.github.io/scispacy/

  1. the first one is en_core_sci_sm I ran pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz and got this error:

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    scispacy 0.4.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible.
    en-core-web-sm 3.0.0 requires spacy<3.1.0,>=3.0.0, but you have spacy 3.4.4 which is incompatible.
    docanalysis 0.2.0 requires spacy==3.0.7, but you have spacy 3.4.4 which is incompatible.
  2. next is en_core_sci_md I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz and got this result: no error

  3. next is en_core_sci_scibert I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz and got this result: no error

  4. next is en_core_sci_lg I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gz and got this result: no error

  5. next is en_ner_craft_md I ran: https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_craft_md-0.5.1.tar.gz and got this result: no error

  6. next is en_ner_jnlpba_md I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_jnlpba_md-0.5.1.tar.gz and got this result: no error

  7. next is en_ner_bc5cdr_md I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz and got this result: no error

  8. LAST is en_ner_bionlp13cg_md I ran: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz and got this result: no error