richardpaulhudson / coreferee

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
MIT License
102 stars 16 forks source link

Support for Spacy 3.7? #29

Open YannZeRookie opened 8 months ago

YannZeRookie commented 8 months ago

Hello, I am unable to test corefree with Spacy 3.7:

✘ spaCy model fr_core_news_lg version 3.7.0 is not supported by
Coreferee. Please examine /coreferee/lang/fr/config.cfg to see the supported
models/versions.

Are there any plans to support Spacy 3.7 with the en_core_news_lg and fr_core_news_lg models? Thanks so much, Yann

bradley-erickson commented 5 months ago

I'm encountering a similar issue and would appreciate the addition of support for a newer version of spaCy.

The specific problem I'm facing is a downstream dependency conflict with Pydantic. Currently, spaCy 3.5, the newest spaCy version supported by coreferee, relies on Pydantic v1, while spaCy introduced support for Pydantic v2 in v3.6.1. Given that other packages I use now depend on Pydantic v2, upgrading spaCy would resolve this compatibility issue.

richardpaulhudson commented 5 months ago

Unfortunately adding Coreferee support for a new version of spaCy is quite a laborious process. New standard spaCy models are trained with each release and the training involves a certain amount of randomness; Coreferee, on the other hand, is a rule-based system running of top of these models. This makes it necessary to rerun the regression tests each time, investigate the discrepancies that inevitably occur and either mark failing tests to be skipped or add rules to cover the new, changed behaviour.

With past spaCy releases I have carried out this process for English, German and Polish, but unfortunately I am no longer working at Explosion and realistically am unlikely to find time to do the necessary testing in the foreseeable future. For French the model was contributed by another user and I lack the knowledge of the language necessary to assess the behaviour of the tests, which is why support for the French model is considerably more out of date than the models for the other three languages.

That said, the lack of support of a Coreferee model for a given spaCy version and the accompanying spaCy models only implies that no rigorous testing of the combination has been carried out. The combination will almost certainly work technically and the results are very likely to be usable too: I did a quick check and for example the latest English coreferee model appears to work quite well with version 3.7.3 of en_core_web_trf. To use the current version of Coreferee unofficially with a later, unsupported spaCy version, perhaps in a branch or a fork:

1) Clone the Coreferee repository. 2) Alter the root setup.cfg to increase the range of supported spaCy versions. 3) Alter the config.cfg for the language you wish to use to increase the range of spaCy models supported by the most recent Coreferee model. 4) Reinstall Coreferee from the changed local code, i.e. pip install . from the root directory.

Sorry not to be able to give you better news, but I hope this helps nonetheless.