kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.59k stars 459 forks source link

Set the Context Path #1104

Open rigsbyjt opened 7 months ago

rigsbyjt commented 7 months ago

I want to run GROBID via docker with a context path that is not simply '/'

I have reviewed the configuration files in grobid-home/config . I see howto change the ports and other configuration changes.

I have searched through the code to find the options and cannot find it.

The changes to grobid.yaml can be instantiated with something like

docker run --rm --gpus all --init -p 8080:8070 -p 8081:8071 -v /home/lopez/grobid/grobid-home/config/grobid.yaml:/opt/grobid/grobid-home/config/grobid.yaml:ro grobid/grobid:0.7.1-SNAPSHOT

Basically I want to access GROBID through

https://localhost:8080/grobid/

instead of

https://localhost:/8070

I was able to get the port part to work.

lfoppiano commented 7 months ago

Hi @rigsbyjt, theoretically, you can change the context root programmatically (https://github.com/kermitt2/grobid/blob/d98129f2953fd5a595bf88890b97a546fe763384/grobid-service/src/main/java/org/grobid/service/main/GrobidServiceApplication.java#L33), however it's not recommended as you would need to either rebuild the docker image or run Grobid natively, and the service might break the convention with other projects, such as the python client and so on.

However if what you need is to make Grobid accessible behind an Apache service, since it's a stateless service, you could configure a reverse proxy in the Apache configuration. The user interface works nicely under any context-root.

Given a standard Apache configuration, you would need something like:

        RewriteRule                  ^/grobid$ /grobid/ [R]
        ProxyPass                      /grobid http://localhost:8070
        ProxyPassReverse        /grobid http://localhost:8070

If you haven't done it already, you might need to enable both http_rewrite and http_proxy (I don't remember the exact names) in the apache configuration.

In this way the URL of the service will be http://apache_service/grobid, and the API can be reached as http://apache_service/grobid/api/service_to_call.