richardpaulhudson / coreferee

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
MIT License
102 stars 16 forks source link

Make English model downloadable through .yml file #5

Open jenghub opened 1 year ago

jenghub commented 1 year ago

Is it possible to host the en model in conda or pypi so that I can download it in a .yml, similar to the spacy models? Basically, just trying to do this:

name: dev
channels:
  - conda-forge
  - defaults
dependencies:
  - pip:
    - spacy
    - coreferee
  - spacy-model-en_core_web_lg
  - spacy-model-en_core_web_trf
  - coreferee-model-en

I can't do the command line install in my setup. Thank you!

richardpaulhudson commented 1 year ago

We make both spaCy and Coreferee models available via Github and avoid uploading them to package managers like pip and conda, mainly because package managers are primarily designed to handle source code rather than large binary models and also because offering models in two places would introduce the risk of inconsistencies creeping in. The conda spaCy models you mention above were not uploaded by Explosion and in fact they are out of date (v3.3 rather than the current v3.4).

It is possible to load the official spaCy models via conda using the following syntax:

# run: conda env create --file environment.yml
name: test-env-conda
channels:
  - conda-forge
dependencies:
  - python>=3.8
  - spacy==3.4.0
  - pip
  - pip:
      - en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.0/en_core_web_sm-3.4.0-py3-none-any.whl

Unfortunately this doesn't work at present for the Coreferee models because they are made available as .zip files rather than as wheels and for some reason conda doesn't accept them as valid .zip files although as far as I can see they actually are. Making them available as wheels instead would be a better solution anyway, so I plan to change this, but as it's a breaking change (I don't want to end up hosting both wheels and .zips in the repo because of the size) I'll put it on hold until the next major release.