kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
275 stars 74 forks source link

Error 500: too many tokens #60

Closed bryanyzhu closed 1 year ago

bryanyzhu commented 1 year ago

Hi I have encountered the following error when running grobid_client_python, does anyone know how to solve it? Thank you.

[TOO_MANY_TOKENS] The document has 1185785 tokens, but the limit is 1000000
bryanyzhu commented 1 year ago

I see that there is a token limit of 1M set in grobid, is there an easier way to update this config from grobid_client_python without changing code in grobid?

kermitt2 commented 1 year ago

Hi @bryanyzhu !

There is a limit of max tokens set on the server side config, this is a "circuit breaker" to prevent problems with memory, JVM crashing, etc. This is specific to the server, it depends on its resources, so it can only be set on the server.

You can change this config value as documented here and set the updated config file when starting a docker container as documented here

bryanyzhu commented 1 year ago

Hi @kermitt2 thanks a lot for your reply. I'm not using a docker container, I install and start grobid in the simplest way,

wget https://github.com/kermitt2/grobid/archive/0.7.2.zip
unzip 0.7.2.zip
cd 0.7.2
./gradlew clean install
./gradlew run

So how to move forward with the updated config file? Maybe modify the config first, then install and run the server again?

Modify the config file first, then
./gradlew clean install
./gradlew run
kermitt2 commented 1 year ago

I'm not using a docker container, I install and start grobid in the simplest way,

I would say using the docker image is the simplest way. If you don't plan to develop on the current master, I would encourage you to use the docker image(s) and exploit the configuration to tune the tool to your environment and documents.

So how to move forward with the updated config file?

Just restart the service with ./gradlew run

bryanyzhu commented 1 year ago

Cool, thanks a lot for your response, it really helps! I will close the issue.