kdpsingh / clinspacy

Clinical Natural Language Processing using spaCy, scispacy, and medspacy
Other
96 stars 19 forks source link

clinspacy error with linker #16

Open emma-wilson opened 1 year ago

emma-wilson commented 1 year ago

Hello, I've recently been using the clinspacy package and found the clinspacy function runs normally without using the linker but throws an error when trying to use the UMLS linker.

Code (using the example from the README file):

clinspacy_init(use_linker = TRUE) 
clinspacy('This patient has diabetes and CKD stage 3 but no HTN.')

Error message:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. 
The detected shape was (5,) + inhomogeneous part.

Is this something you've encountered before or can help with? I'm using R version 4.2.1.

Josefmo commented 1 year ago

I've experienced something quite similar.

> clinspacy_init(use_linker = TRUE) 
Initializing clinspacy using clinspacy_init()...
Checking if miniconda is installed...
Importing spaCy...
Importing scispaCy...
Importing medspaCy...
Loading the en_core_sci_lg language model...
Loading the UMLS entity linker... (this may take a while)
Adding the UMLS entity linker to the spacy pipeline...
/Users/__/Library/r-miniconda/envs/clinspacy/lib/python3.8/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator TfidfTransformer from version 0.20.3 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/Users/__/Library/r-miniconda/envs/clinspacy/lib/python3.8/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator TfidfVectorizer from version 0.20.3 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(

Then I ran the same code as above...

> clinspacy('This patient has diabetes and CKD stage 3 but no HTN.')
  |=============================================================================================| 100%
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. 
The detected shape was (5,) + inhomogeneous part.
Run `reticulate::py_last_error()` for details.

I'm running v.4.3. Can this be potentially fixed?

cha-petersumm commented 1 year ago

I have the same issue.

I can't add anything to what's been said above, but I'm happy to answer any questions about my environment and would like to know once it's fixed.

Minlei0201 commented 6 months ago

Same experience on 4.3.1. Could anyone share how this issue can be fixed?

kdpsingh commented 6 months ago

Hi all, I'm belatedly coming to the realization that I don't have the bandwidth to maintain this package. Most of my dev efforts are currently focused on Julia packages.

The main issue here is that the versions of various Python packages need to be updated to the latest versions that work concurrently (along with a compatible scispacy language model), and possibly some minor tweaks may be needed including case the spacy API has changed recently.

If anyone wants to undertake this as a PR, I will be happy to review and merge.

If anyone is interested in taking over package maintenance, I'm happy to discuss it. Thank you.