ncbi-nlp / NegBio

:newspaper: High-performance tool for negation and uncertainty detection in radiology reports
Other
156 stars 41 forks source link

Problem loading bllipparser. #40

Closed kaushikepi closed 4 years ago

kaushikepi commented 4 years ago
{'--bllip-model': '~/.local/share/bllipparser/GENIA+PubMed',
 '--mention_phrases_dir': 'negbio/chexpert/phrases/mention',
 '--neg-patterns': 'negbio/chexpert/patterns/negation.txt',
 '--newline_is_sentence_break': False,
 '--output': '/content/NegBio/examples/test.neg.xml',
 '--post-negation-uncertainty-patterns': 'negbio/chexpert/patterns/post_negation_uncertainty.txt',
 '--pre-negation-uncertainty-patterns': 'negbio/chexpert/patterns/pre_negation_uncertainty.txt',
 '--split-document': False,
 '--unmention_phrases_dir': 'negbio/chexpert/phrases/unmention',
 '--verbose': False,
 'SOURCE': None,
 'SOURCES': ['/content/NegBio/negbio/examples/00000086.txt', '/content/NegBio/examples/00019248.txt'],
 'bioc': False,
 'text': True}
/usr/local/lib/python3.6/dist-packages/jpype/_core.py:217: UserWarning: 
-------------------------------------------------------------------------------
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False.  The legacy value of True was assumed for
this session. If you are a user of an application that reported this warning,
please file a ticket with the developer.
-------------------------------------------------------------------------------

  """)
Traceback (most recent call last):
  File "/content/NegBio/negbio/main_chexpert.py", line 132, in <module>
    main()
  File "/content/NegBio/negbio/main_chexpert.py", line 88, in main
    parser = NegBioParser(model_dir=argv['--bllip-model'])
  File "/usr/local/lib/python3.6/dist-packages/negbio/pipeline/parse.py", line 20, in __init__
    self.rrp = RerankingParser.from_unified_model_dir(self.model_dir)
  File "/usr/local/lib/python3.6/dist-packages/bllipparser/RerankingParser.py", line 864, in from_unified_model_dir
    reranker_weights_filename) = get_unified_model_parameters(model_dir)
  File "/usr/local/lib/python3.6/dist-packages/bllipparser/RerankingParser.py", line 931, in get_unified_model_parameters
    raise IOError("Model directory '%s' does not exist" % model_dir)
OSError: Model directory '/root/.local/share/bllipparser/GENIA+PubMed' does not exist
kaushikacharya commented 4 years ago

@kaushikepi Have a look at the code https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/parse.py#L13

if model_dir is None:
            logging.debug("downloading GENIA+PubMed model if necessary ...")
            model_dir = ModelFetcher.download_and_install_model(
                'GENIA+PubMed', os.path.join(tempfile.gettempdir(), 'models'))

' If you are passing model directory of bllip-model, then its expecting the model to be present in the directory. If its not there, then better not to pass this argument so that ModelFetcher can download for you.

In case ModelFetcher is not able to download the data(e.g. due to proxy issue), then download manually from the source mentioned in https://github.com/BLLIP/bllip-parser/blob/master/python/bllipparser/ModelFetcher.py#L39

kaushikepi commented 4 years ago

@kaushikacharya Thank for helping but after assigning none to the Model_dir path. I am facing this issue

TypeError: Package edu.stanford.nlp.util.Filters.acceptFilter is not Callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "negbio/main_chexpert.py", line 132, in <module>
    main()
  File "negbio/main_chexpert.py", line 86, in main
    ptb2dep = NegBioPtb2DepConverter(lemmatizer, universal=True)
  File "/Users/kaushikjaiswal/.local/lib/python3.7/site-packages/negbio/pipeline/ptb2ud.py", line 103, in __init__
    lemmatizer, representation, universal)
  File "/Users/kaushikjaiswal/.local/lib/python3.7/site-packages/negbio/pipeline/ptb2ud.py", line 70, in __init__
    self.__sd = StanfordDependencies.get_instance(backend=self._backend)
  File "/Users/kaushikjaiswal/anaconda3/envs/negbio3.7/lib/python3.7/site-packages/StanfordDependencies/StanfordDependencies.py", line 243, in get_instance
    return JPypeBackend(**extra_args)
  File "/Users/kaushikjaiswal/anaconda3/envs/negbio3.7/lib/python3.7/site-packages/StanfordDependencies/JPypeBackend.py", line 51, in __init__
    self._report_version_error(version)
  File "/Users/kaushikjaiswal/anaconda3/envs/negbio3.7/lib/python3.7/site-packages/StanfordDependencies/JPypeBackend.py", line 202, in _report_version_error
    raise JavaRuntimeVersionError()
StanfordDependencies.StanfordDependencies.JavaRuntimeVersionError: Your Java runtime is too old (must be 1.8+ to use CoreNLP version 3.5.0 or later and 1.6+ to use CoreNLP version 1.3.1 or later)
kaushikepi commented 4 years ago

java version "1.8.0_131" Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

@kaushikacharya @yfpeng @alistairewj

kaushikepi commented 4 years ago

Steps followed:

  1. Cloned the repo.
  2. Created an environment using environment3.7.yml file.
  3. Run the script using python negbio/main_chexpert.py text --output=examples/test.neg.xml examples/00000086.txt examples/00019248.txt
kaushikacharya commented 4 years ago

@kaushikepi

You can try using the backend subprocess instead of jpype

In https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/ptb2ud.py#L70 self.__sd = StanfordDependencies.get_instance(backend=self._backend) pass subprocess as hard coded value for parameter backend. (Another option is uninstall JPype1)

This will also give you the output as discussed in https://github.com/ncbi-nlp/NegBio/issues/13 Only thing is jpype is a faster option.

It seems the error you are getting is thrown by Stanford CoreNLP due to unexpected java version. I am not sure why. But can you check if there's multiple java versions are present in your m/c and the python code is getting the other version of java.

kaushikacharya commented 4 years ago

@kaushikepi Regarding the JVM, have a look at

https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/JPypeBackend.py#L39

jpype.startJVM(jvm_path or jpype.getDefaultJVMPath(),
                           '-ea',
                           '-Djava.class.path=' + self.jar_filename,
                           *(extra_jvm_args or []))

Check jpype.getDefaultJVMPath() to confirm if its starting the JVM with Java 1.8 or not.

kaushikepi commented 4 years ago

@kaushikacharya Thanks for replying. I will follow the steps suggested by you and will post the issue accordingly

kaushikepi commented 4 years ago
(negbio3.7) kaushikjaiswal (master *) NegBio $ python negbio/main_chexpert.py text --output=examples/test.neg.xml examples/00000086.txt examples/00019248.txt
{'--bllip-model': 'None',
 '--mention_phrases_dir': 'negbio/chexpert/phrases/mention',
 '--neg-patterns': 'negbio/chexpert/patterns/negation.txt',
 '--newline_is_sentence_break': False,
 '--output': 'examples/test.neg.xml',
 '--post-negation-uncertainty-patterns': 'negbio/chexpert/patterns/post_negation_uncertainty.txt',
 '--pre-negation-uncertainty-patterns': 'negbio/chexpert/patterns/pre_negation_uncertainty.txt',
 '--split-document': False,
 '--unmention_phrases_dir': 'negbio/chexpert/phrases/unmention',
 '--verbose': False,
 'SOURCE': None,
 'SOURCES': ['examples/00000086.txt', 'examples/00019248.txt'],
 'bioc': False,
 'text': True}
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
Segmentation fault: 11

Getting Segmentation Fault error

kaushikacharya commented 4 years ago

'--bllip-model': 'None'

I think it should be --bllip-model': None Or just ignore this input parameter.

kaushikepi commented 4 years ago
{'--mention_phrases_dir': 'negbio/chexpert/phrases/mention',
 '--neg-patterns': 'negbio/chexpert/patterns/negation.txt',
 '--newline_is_sentence_break': False,
 '--output': 'examples/test.neg.xml',
 '--post-negation-uncertainty-patterns': 'negbio/chexpert/patterns/post_negation_uncertainty.txt',
 '--pre-negation-uncertainty-patterns': 'negbio/chexpert/patterns/pre_negation_uncertainty.txt',
 '--split-document': False,
 '--unmention_phrases_dir': 'negbio/chexpert/phrases/unmention',
 '--verbose': False,
 'SOURCE': None,
 'SOURCES': ['examples/00000086.txt', 'examples/00019248.txt'],
 'bioc': False,
 'text': True}
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
Segmentation fault: 11

It doesn't make any difference. is it necessary to run the setup.py file?

kaushikacharya commented 4 years ago

@kaushikepi Regarding the JVM, have a look at

https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/JPypeBackend.py#L39

jpype.startJVM(jvm_path or jpype.getDefaultJVMPath(),
                           '-ea',
                           '-Djava.class.path=' + self.jar_filename,
                           *(extra_jvm_args or []))

Check jpype.getDefaultJVMPath() to confirm if its starting the JVM with Java 1.8 or not.

In the other issue you had raised: https://github.com/ncbi-nlp/NegBio/issues/41 your error stack mentions

Your Java version: 14

Whereas in my local m/c https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/JPypeBackend.py#L50 prints

java.version: 1.8.0_131

So my guess is you haven't setup Java properly.

You can read on JPype here https://jpype.readthedocs.io/en/latest/quickguide.html

kaushikepi commented 4 years ago

I run the module after uninstalling the JPype such that it uses subprocess as a hardcoded value for parameter backend.

kaushikacharya commented 4 years ago

is it necessary to run the setup.py file?

I think so, as it is using the install NegBio as shown in your error stack:

/Users/kaushikjaiswal/.local/lib/python3.7/site-packages

kaushikacharya commented 4 years ago

I run the module after uninstalling the JPype such that it uses subprocess as a hardcoded value for parameter backend.

Did this succeeded?

Also note that during the first run, the system will download Blliip model. So next time onwards you need to pass the path of bllip model

kaushikepi commented 4 years ago
{'--mention_phrases_dir': 'negbio/chexpert/phrases/mention',
 '--neg-patterns': 'negbio/chexpert/patterns/negation.txt',
 '--newline_is_sentence_break': False,
 '--output': 'examples/test.neg.xml',
 '--post-negation-uncertainty-patterns': 'negbio/chexpert/patterns/post_negation_uncertainty.txt',
 '--pre-negation-uncertainty-patterns': 'negbio/chexpert/patterns/pre_negation_uncertainty.txt',
 '--split-document': False,
 '--unmention_phrases_dir': 'negbio/chexpert/phrases/unmention',
 '--verbose': False,
 'SOURCE': None,
 'SOURCES': ['examples/00000086.txt', 'examples/00019248.txt'],
 'bioc': False,
 'text': True}
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
/Users/kaushikjaiswal/.local/lib/python3.7/site-packages
Segmentation fault: 11

It doesn't make any difference. is it necessary to run the setup.py file?

it failed. I followed these steps:

  1. Created an condo environment using .yml file.
  2. Deleted the JPype package and run the module.
kaushikepi commented 4 years ago

I run the module after uninstalling the JPype such that it uses subprocess as a hardcoded value for parameter backend.

Did this succeeded?

Also note that during the first run, the system will download Blliip model. So next time onwards you need to pass the path of bllip model

I removed the argument such that it creates the bllip object using model_dir = None . But it didn't print any downloading information

kaushikacharya commented 4 years ago

But it didn't print any downloading information

In my first answer on this thread, I had mentioned that you can download bllip-model manually too,

kaushikepi commented 4 years ago

Ok then,

  1. will download bllip manually and give the absolute path.
  2. Should I keep the jpype or uninstall it?
  3. Java runtime should be 1.8 Anything else would you like to suggest?
jpype.getDefaultJVMPath()
'/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/lib/jli/libjli.dylib

The above is the output when the checked the version.

kaushikacharya commented 4 years ago

Also note that during the first run, the system will download Blliip model. So next time onwards you need to pass the path of bllip model

Sorry, I was wrong in assuming that bllip model will be downloaded every time if bllip-model is passed as None. https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/StanfordDependencies.py#L147

def download_if_missing(self, version=None, verbose=True):
......
if os.path.exists(self.jar_filename):
            return
kaushikacharya commented 4 years ago

Ok then,

  1. will download bllip manually and give the absolute path.
  2. Should I keep the jpype or uninstall it?
  3. Java runtime should be 1.8 Anything else would you like to suggest?
jpype.getDefaultJVMPath()
'/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/lib/jli/libjli.dylib

The above is the output when the checked the version.

First try executing the script without jpype i.e. with subprocess backend. Only when this is successful, attempt with jpype. In both cases, you would need bllip-model.

Check this too: jpype.java.lang.System.getProperty("java.version") Note that you need to start JVM before that jpype.startJVM(jpype.getDefaultJVMPath())

kaushikepi commented 4 years ago

(negbio3.7) kaushikjaiswal (master *) NegBio $ python Python 3.7.7 (default, Mar 26 2020, 10:32:53) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information.

import jpype jpype.startJVM(jpype.getDefaultJVMPath())

jpype.isJVMStarted() True jpype.java.lang.System.getProperty("java.version") '1.8.0_131'

``

is it fine?

kaushikacharya commented 4 years ago

is it fine?

Seems fine to me.

Segmentation fault: 11

It's difficult to figure out where you are getting the error.

You would need to do some debugging where's the error coming from.

https://github.com/ncbi-nlp/NegBio/blob/master/negbio/main_chexpert.py#L50 Find out whether your script is entering pipeline()? If yes, then in which step of pipeline its failing.

Honestly speaking, I haven't used main_chexpert.py but had used only main_mm.py

KaushikJais commented 4 years ago

Also note that during the first run, the system will download Blliip model. So next time onwards you need to pass the path of bllip model

Sorry, I was wrong in assuming that bllip model will be downloaded every time if bllip-model is passed as None. https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/StanfordDependencies.py#L147

def download_if_missing(self, version=None, verbose=True):
......
if os.path.exists(self.jar_filename):
            return

so In the first run if the bllip model parameter is None then at the second run do I need to pass the model path or it will download every time whenever the parameter is set as None.

KaushikJais commented 4 years ago

@kaushikacharya For using main_mm.py do i need to install Metamap from the official website of UMLS?

kaushikacharya commented 4 years ago

@kaushikacharya For using main_mm.py do i need to install Metamap from the official website of UMLS?

Yes, you need to install MetaMap. And also don't forget to start

  1. SKR/Medpost Part-of-Speech Tagger Server
  2. Word Sense Disambiguation (WSD) Server

as explained in https://metamap.nlm.nih.gov/Installation.shtml

The link to this URL is also mentioned in https://github.com/AnthonyMRios/pymetamap

Note: Since you are using python 3, install pymetamap locally using the repository, This is because the pip package of pymetamap is an old one. It doesn't work for python 3. https://github.com/AnthonyMRios/pymetamap/pull/44 is the commit that was done for handling python 3.

KaushikJais commented 4 years ago

@kaushikacharya Thanks for helping. Really appreciate it. I have one more query that which algorithm is better since there is no comparison between CheXpert and MetaMap. Can I use my own medical databases to get the result?. While going through the codebase I found that CheXpert uses phrases that contains various .txt file to tag the Observations.

kaushikacharya commented 4 years ago

I have one more query that which algorithm is better since there is no comparison between CheXpert and MetaMap.

Honestly speaking, I have only used MetaMap. I can't make any judgement in comparison to CheXpert. I would suggest you should read the papers to get an idea how these algorithms work, You can(should) also evaluate the performance of both on your data.

If your current issue has been resolved, you can close this issue thread(with comments, if any on how you resolved the issue which might help other users in future).

KaushikJais commented 4 years ago

Steps were taken to run CheXpert Algo on a local machine:

  1. Cloned the repo.
  2. Created a new conda environment using environment3.7.ymlfile.
  3. Run the setup.py usingpython setup.py install command.
  4. Run the main_chexpert.py after assigning None value to --bllip-model = None argument for the first time and then from the second time onward set the value of the --bllip-model argument as the path of the downloaded model.

Command used: `python negbio/main_chexpert.py text --output=examples/test.neg.xml examples/00019248.txt examples/00000086.txt

Note: One can set the path of the bllip_parser model by replacing tempfile.gettempdir() to anywhere they want: https://github.com/ncbi-nlp/NegBio/blob/c04cdb9cee08204707ab3fa3cbcb0c2c89f9e468/negbio/pipeline/parse.py#L16 `