shon-otmazgin / fastcoref

MIT License
142 stars 25 forks source link

`predict` performance when using spacy #57

Open p8a opened 1 month ago

p8a commented 1 month ago

Hello,

I'm seen slow processing speeds when using fastcoref and spacy, half of the time seems to be spent in datasets _save_spacyLanguage function.

Any advice on how to optimize this ? The code seems to be busy serializing something I don't think it's needed by the model itself.

The call stack looks something like this:

spacy_component.py: FastCorefResolver.__call__
modeling.py: CorefModel.predict
modeling.py: CorefModel._create_dataset 
datasets/arrow_dataset.py: Dataset.from_dict
datasets/dill/_dill.py: Pickler.save
datasets/dill/_dill.py: _save_spacyLanguage
....

This is a profiler snapshot: Screenshot 2024-07-10 at 13 42 29

Thanks