jgontrum / spacy-api-docker

spaCy REST API, wrapped in a Docker container.
https://hub.docker.com/r/jgontrum/spacyapi/
MIT License
265 stars 99 forks source link

How can I download en_core_web_lg? #28

Closed nitinthewiz closed 4 years ago

nitinthewiz commented 5 years ago

Hey Johannes!

Thanks for this awesome work here!

I was testing the docker image for English out and noticed that it by default downloads en_core_web_sm.

I tried to get it to download en_core_web_lg through various methods, including by changing the code as follows and passing the ENV variable languages to be en_core_web_lg-2.1.0 - https://github.com/nitinthewiz/spacy-api-docker/blob/master/displacy_service/scripts/download.py#L10

But this process fails, because the API server doesn't start, as it complains about spacy.load("en_core_web_lg-2.1.0") failing.

spacy.load seems to expect "en" only, and doesn't work with anything else.

Could you tell me if there's a way to convince the docker image to run with en_core_web_lg?

Thanks a lot!

bigman73 commented 5 years ago

Please add support for loading this large model. I'm facing the exact same need, as the default small model is not returning good enough results.

nitinthewiz commented 5 years ago

@bigman73 I was able to follow the hint given here - https://github.com/jgontrum/spacy-api-docker/issues/15

to do this in the server.py -

# MODELS = os.getenv("languages", "").split()

MODELS = {
    "en_core_web_sm": spacy.load("en_core_web_sm"),
    "en_core_web_lg": spacy.load("en_core_web_lg"),
}

and change the download.py to -

def download_models():
    languages = os.getenv("languages", "en").split()
    for lang in languages:
        # download(model=lang, direct=False)
        download(model="en_core_web_lg-2.1.0", direct=True)
        download(model="en_core_web_sm-2.1.0", direct=True)

With this code (if you build the images yourself - both the base and the specific English one) - it doesn't show up in the /ui, but if you make a call specifying the custom model, it works.

Would be better if @jgontrum made this aspect easier and better, but so it goes.

bigman73 commented 5 years ago

Thanks. It makes sense as a temporary solution, but I'd like to see this going into the code and docker image.

bigman73 commented 5 years ago

I found a different and simpler solution that requires no code changes

A new custom Dockerfile was created but instead of 'en' value for languages the 'en_core_web_lg' was used:

FROM jgontrum/spacyapi:base_v2

ENV languages "en_core_web_lg"
RUN cd /app && env/bin/download_models

Build the local docker image: (I used my user name to differentiate from jgontrum)

docker build . -t bigman73/spacyapi

Then run the container:

docker run --name spacyapi-en-lg -p 127.0.0.1:8080:80 -d bigman73/spacyapi
bigman73 commented 4 years ago

@babyhuey - the large image works very well.

docker pull jgontrum/spacyapi:en_v2_lg
bigman73 commented 4 years ago

@nitinthewiz I think the issue can be closed now

nitinthewiz commented 4 years ago

I agree! Thanks @bigman73 and @babyhuey!

faridelya commented 1 year ago

Error while installing en_core_web_lg

[7/8] RUN pip install --no-cache-dir --upgrade -r requirements.txt:

0 2.427 Collecting urllib3

0 4.353 Downloading urllib3-1.26.12-py2.py3-none-any.whl (140 kB)

0 5.751 Collecting pip==22.3

0 5.891 Downloading pip-22.3-py3-none-any.whl (2.1 MB)

0 8.280 Collecting wheel==0.37.1

0 8.392 Downloading wheel-0.37.1-py2.py3-none-any.whl (35 kB)

0 9.975 Collecting setuptools==65.4.1

0 10.09 Downloading setuptools-65.4.1-py3-none-any.whl (1.2 MB)

0 12.37 Collecting spacy==3.4.1

0 12.59 Downloading spacy-3.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)

0 15.83 Collecting en-core-web-lg@ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.4.0/en_core_web_lg-3.4.0-py3-none-any.whl

0 18.47 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.4.0/en_core_web_lg-3.4.0-py3-none-any.whl (587.7 MB)

0 71.08 ERROR: Exception:

0 71.08 Traceback (most recent call last):

0 71.08 File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 425, in _error_catcher

0 71.08 yield

0 71.08 File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 507, in read

0 71.08 data = self._fp.read(amt) if not fp_closed else b""

0 71.08 File "/usr/lib/python3.8/http/client.py", line 459, in read

0 71.08 n = self.readinto(b)

0 71.08 File "/usr/lib/python3.8/http/client.py", line 503, in readinto

0 71.08 n = self.fp.readinto(b)

0 71.08 File "/usr/lib/python3.8/socket.py", line 669, in readinto

0 71.08 return self._sock.recv_into(b)

0 71.08 File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into

0 71.08 return self.read(nbytes, buffer)

0 71.08 File "/usr/lib/python3.8/ssl.py", line 1099, in read

0 71.08 return self._sslobj.read(len, buffer)

0 71.08 socket.timeout: The read operation timed out

0 71.08

0 71.08 During handling of the above exception, another exception occurred:

0 71.08

0 71.08 Traceback (most recent call last):

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 186, in _main

0 71.08 status = self.run(options, args)

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 357, in run

0 71.08 resolver.resolve(requirement_set)

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 177, in resolve

0 71.08 discovered_reqs.extend(self._resolve_one(requirement_set, req))

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one

0 71.08 abstract_dist = self._get_abstract_dist_for(req_to_install)

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for

0 71.08 abstract_dist = self.preparer.prepare_linked_requirement(req)

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 480, in prepare_linked_requirement

0 71.08 local_path = unpack_url(

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 282, in unpack_url

0 71.08 return unpack_http_url(

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 158, in unpack_http_url

0 71.08 from_path, content_type = _download_http_url(

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 303, in _download_http_url

0 71.08 for chunk in download.chunks:

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/utils/ui.py", line 160, in iter

0 71.08 for x in it:

0 71.08 File "/usr/lib/python3/dist-packages/pip/_internal/network/utils.py", line 15, in response_chunks

0 71.08 for chunk in response.raw.stream(

0 71.08 File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 564, in stream

0 71.08 data = self.read(amt=amt, decode_content=decode_content)

0 71.08 File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 529, in read

0 71.08 raise IncompleteRead(self._fp_bytes_read, self.length_remaining)

0 71.08 File "/usr/lib/python3.8/contextlib.py", line 131, in exit

0 71.08 self.gen.throw(type, value, traceback)

0 71.08 File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 430, in _error_catcher

0 71.08 raise ReadTimeoutError(self._pool, None, "Read timed out.")

0 71.08 urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Read timed out.


failed to solve: executor failed running [/bin/sh -c pip install --no-cache-dir --upgrade -r requirements.txt]: exit code: 2