kdpsingh / clinspacy

Clinical Natural Language Processing using spaCy, scispacy, and medspacy
Other
96 stars 19 forks source link

bind_clinspacy_embeddings() results in error when using ciu2vec embeddings and no entities found #4

Closed kdpsingh closed 4 years ago

kdpsingh commented 4 years ago

This example in the README.Rmd is currently causing an error.

This example tries to obtain the scispacy embeddings when the linker is turned on (due to a change in default settings). When the linker is turned off, this seems to work fine.

bind_clinspacy_embeddings(mtsamples[1:5, 1:2],
                          text = 'description',
                          num_embeddings = 5,
                          semantic_types = 'Diagnostic Procedure')
kdpsingh commented 4 years ago

Fixed in https://github.com/ML4LHS/clinspacy/commit/c3cd3a00d27ec9b9239b5df0421f8555d0039e09. Note that as pointed out in the README, the scispacy embeddings may be slightly different with use_linker set to TRUE because of duplicates. The advantage of the current approach is that you can limit by semantic_types but the cost is the slightly different embeddings. If we disable semantic_types in a future version, we may need to dynamically add and remove the linker step to ensure the same result.