shon-otmazgin / fastcoref

MIT License
142 stars 25 forks source link

Prevent fastcoref from downloading en_core_web_sm for already tokenzied text #24

Closed aryehgigi closed 1 year ago

aryehgigi commented 1 year ago

By default FCoref downloads and loads the en_core_web_sm model. As I understand, FCoref needs to have a tokenizer ready in case it gets non-tokenized text, so if no nlp object is passed you retrieve the nlp object from the web.

Since we (and maybe other future users) don't intend to pass non-tokenized text to the model object, there is no need to download this model from the web. Can you please allow this option? For now we use the following workaround: FCoref(device=device, nlp="DUMMY") which prevents the from fastcoref to download the model

thanks! @shon-otmazgin @ariecattan

shon-otmazgin commented 1 year ago

@aryehgigi make sense.. I will try to find time to implement it.

aryehgigi commented 1 year ago

thanks so much! if you dont find the time, let me know and ill try to prepare a PR for you?

shon-otmazgin commented 1 year ago

sure!