kermitt2 / datastet

Finding mentions and citations to named and implicit research datasets from within the academic literature
Apache License 2.0
19 stars 5 forks source link

Update docker build #3

Closed lfoppiano closed 6 months ago

lfoppiano commented 6 months ago

This PR updates the docker build so that it can be automated in Github Actions.

The Dockerfile was rewritten so that we based the image on a grobid image, rather than rebuilding everything from scratch from a tensorflow's image.

The word2vec embeddings (need for loading the GRU architecture of the dataseer-ML classifiers models) are not preprocessed because the space on github actions is limited and for the time being not used at the moment in favour of BERT based classifiers.

lfoppiano commented 6 months ago

Sorry, I made the PR onto the wrong repository