openminted / uc-tdm-socialsciences

Social sciences literature Text Mining software collection

https://builds.openminted.eu/job/uc-socialsciences/ws/ss-doc/target/generated-docs/user-guide.html

Other

1 stars 1 forks source link

Deploy our custom NER models to maven #31

Closed maxxkia closed 7 years ago

maxxkia commented 7 years ago

Our custom NER models can't be accessed from jar since path file to the models is hard-coded in model properties files. So, models should be deployed to a maven repository.

[x] create ant script for deploying our model to dkpro public model releases repository.
[x] publish our models to dkpro model release repository

neumannm commented 7 years ago

23 is basically the same issue ;)

maxxkia commented 7 years ago

Right. I decided to publish the model to a maven repository, this makes things a lot easier.

neumannm commented 7 years ago

Can you explain shortly how this works?

maxxkia commented 7 years ago

Sure. I uploaded our models to ukp public data server. https://public.ukp.informatik.tu-darmstadt.de/kiaeeha/omtd-ner-model/

Then I added this ant script to our project which is supposed to download the models from an external server and install them in a maven repository in this case: https://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-ukp-staging-local (look at the ant-macros.xml file which is imported in the beginning of this script).

After the model is published to ukp-staging maven repository we check that the files are correct. Then we publish the model to ukp public model releases repository: http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local/

This way our models can automatically be retrieved from maven repository, like the standard stanford models included inside dkpro-core framework.

reckart commented 7 years ago

I see you have added tag-mapping metadata into the build.xml. However, the mapping does not correspond to the actual tags used in the model.

en Model: B-LOC B-ORGgov B-ORGoth B-ORGsci B-OTHevt B-OTHmed B-OTHoff B-PERgrp B-PERind I-LOC I-ORGoth I-ORGsci I-OTHevt I-OTHmed I-OTHoff I-PERind
build.xml: 0 o per loc org sub oth

The tag mappings are meant for cases where one needs to fix minor differences between the tags produced by the model and the tags that one would expect, e.g. changing the case or fixing a tag typo. In your case, you should not use them. Instead, you should set up a variant-specific mapping. See: https://github.com/dkpro/dkpro-core/tree/master/dkpro-core-stanfordnlp-gpl/src/main/resources/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib

maxxkia commented 7 years ago

Thanks for the point. I removed those extra lines in build.xml. Actually we already have those mapping files.

maxxkia commented 7 years ago

Current trained model includes B- and I- tags which is not compatible with current mapping anymore. I'll upload new models and let you know after I update the build.xml.

maxxkia commented 7 years ago

@reckart I assume I should move the mapping files to dkpro-core/stanfordnlp module since people might want to use our variant without using our module. Right?

maxxkia commented 7 years ago

@reckart I uploaded new models and updated build.xml. Please deploy the new models to repository.

reckart commented 7 years ago

I'm considering to replicate the build.xml parts from this project in DKPro Core and then I'd also move the mappings over there, yes.