kdpsingh / clinspacy

Clinical Natural Language Processing using spaCy, scispacy, and medspacy
Other
96 stars 19 forks source link

Rewrite bind_* functions to rely on output from clinspacy() #5

Closed kdpsingh closed 4 years ago

kdpsingh commented 4 years ago

Practically, the bind_* functions take a while to run (and can produce large files). Thus, if you want to experiment with including/excluding them, you need to process the same text multiple times.

Tasks: [ ] vectorize clinspacy() so it can run on vectors of length > 1 [ ] make it so that clinspacy() can write output directly to file (csv/fst/parquet?) [ ] rewrite bind_* functions to rely on the output from the clinspacy function so that any filtering on (semantic_types, etc) happens after the fact.

kdpsingh commented 4 years ago

Fixed in https://github.com/ML4LHS/clinspacy/commit/ace41f248c9aed20903ecd57b2cf31994ca64ea0.