inception-project / inception-external-recommender

Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-service compatible with the external recommender API of INCEpTION.
Apache License 2.0
40 stars 17 forks source link

Add FlairNLP Sequence Tagging #55

Closed raykyn closed 6 months ago

raykyn commented 6 months ago

This pull request adds a script to the contribs which enables the usage of the FlairNLP (https://flairnlp.github.io/) sequence tagger (not necessarily only for NER). The class can either be used with SegTok-Sentencesplitting or simply input the whole document as a single Sentence-object (do not use for very long documents).

I had to implement a workaround when not using the CAS-Sentence-Nodes because Inception performs an internal tokenization where punctuation is represented as their own tokens, even if not separated by whitespaces.

I tested it with and without sentence splitting, and with local and remote models. Works well on my server (Still on Version 26.8, but I assume it should work on newer versions as well).

If this script gets added, the requirements of the package will also need updating, I tested it with flair Version 0.13.1.

reckart commented 6 months ago

Thanks for the PR. Could you please add the same license header that we also use in the other files?

It would also be nice if you could add this to the table here: https://github.com/inception-project/inception-external-recommender?tab=readme-ov-file#contrib-models

Best also directly upgrade the requirements as necessary so the PR can be merged "as is".

raykyn commented 6 months ago

I added the content you asked for, but the requirements show a conflict: Flair 0.13.1 needs more-itertools >=8.13.0, but dkpro-cassis is very strict in requiring version 8.12.*. I didn't run into any problems when using the newer (flair-compatible) version of more-itertools though. How should I proceed @reckart ?

reckart commented 6 months ago

@raykyn I have relaxed the version restriction on itertools in cassis - looks the tests all work with the new range:

https://github.com/dkpro/dkpro-cassis/issues/305

I guess we need a release of cassis now, right?

raykyn commented 6 months ago

I believe so, otherwise the dependency won't be updated for anyone using pip install to get the dependencies.

reckart commented 6 months ago

Roger, I'll run a release tonight probably.

reckart commented 6 months ago

Cassis 0.9.1 is available

raykyn commented 6 months ago

Perfect!

Now while it works, there's just one thing - if someone has flair previously installed, it will still show a warning when installing dkpro-cassis because the requirement is still set to have the version below 0.9 (and more-itertools is now over version 0.10). But I don't think that's too big of a problem?

I've also tried adding a test, but I can't get the tests (not only my flair test, but also the spacy one) to run, I always get the error

Traceback (most recent call last):
  File "(my path)/inception-external-recommender/tests/test_spacy_recommender.py", line 21, in <module>
    from tests.util import load_obama, PREDICTED_TYPE, PREDICTED_FEATURE, PROJECT_ID, USER
ImportError: cannot import name 'load_obama' from 'tests.util' (/home/iprada/anaconda3/lib/python3.11/site-packages/tests/util.py)

I did install the test dependencies as required in the README.

raykyn commented 6 months ago

btw I'm already using the flair recommender on my inception instance and it really speeds up the annotation. Thank you for your efforts @reckart ! (there's a bug that i can work around, but my inception instance is a few versions behind, so before i write an issue I'll update and see if it's resolved)

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 0% with 40 lines in your changes are missing coverage. Please review.

Project coverage is 52.61%. Comparing base (d4fea9a) to head (5fe369f).

Files Patch % Lines
ariadne/contrib/flair.py 0.00% 40 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #55 +/- ## ========================================== - Coverage 55.13% 52.61% -2.52% ========================================== Files 22 23 +1 Lines 838 878 +40 ========================================== Hits 462 462 - Misses 376 416 +40 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.