Practically, the bind_* functions take a while to run (and can produce large files). Thus, if you want to experiment with including/excluding them, you need to process the same text multiple times.
Tasks:
[ ] vectorize clinspacy() so it can run on vectors of length > 1
[ ] make it so that clinspacy() can write output directly to file (csv/fst/parquet?)
[ ] rewrite bind_* functions to rely on the output from the clinspacy function so that any filtering on (semantic_types, etc) happens after the fact.
Practically, the bind_* functions take a while to run (and can produce large files). Thus, if you want to experiment with including/excluding them, you need to process the same text multiple times.
Tasks: [ ] vectorize clinspacy() so it can run on vectors of length > 1 [ ] make it so that clinspacy() can write output directly to file (csv/fst/parquet?) [ ] rewrite bind_* functions to rely on the output from the clinspacy function so that any filtering on (semantic_types, etc) happens after the fact.