Closed tobiasdiez closed 7 years ago
Hello @tobiasdiez
the best and simplest way to use GROBID is by far the web service - it will manage multithreading, scale very well, models will always been warm, etc. If it's possible for you to integrate Grobid like this (it's the way it has finally been integrated in Apache Tika), you should go for it. But it's possible of course to integrate it in a java application, as explained in http://grobid.readthedocs.io/en/latest/Grobid-java-library/
don't use maven central repo, it is not updated and the problem is that grobid-home is anyway required and will not be part of any maven artifact. I think it's easier to use a local artifact as illustrated by http://grobid.readthedocs.io/en/latest/Grobid-java-library/
grobid-home
is the folder which come with grobid under grobid/grobid-home so it is downloaded with grobid and there when building the project.
grobid-home
cannot be included in a jar, because the JNI lib used in GROBID cannot use resources in a jar, it has to be available in the file system of the deployment machine (see see #245).
In addition, there is by default a tmp/ directory in grobid-home which is used for temporary files used in the pdf parsing (this tmp/ path can be changed, but as a grobid-home is required it's simpler to have it together)
"roughly how big is the resulting file?": it depends in the models you need to use in your application, you can keep only the required models (see #242). For bibliographical references processing only, you need the citation, date, author and name/citation models and the lexicon, probably around 100MB (the largest models are for analysing the full text).
@kermitt2 Thanks for you detailed answer! With this information we should be able to evaluate whether and in which way we could integrate grobid.
We are thinking about integrating grobid into the bibliographic manager JabRef, but I had problems figuring out how to do that/how easy that would be. I hope you don't mind to much if I use this issue tracker as a place to ask a few questions:
grobid-home
folder. What is supposed to be in this folder and where can I download it? Is the end user supposed to download it separately or can it be included as a dependency in thejar
file?Thanks!