Fetching files on requests cause timeout

AllanElleuch commented 1 year ago

Hi,

I started locally an instance and run a generation request from the documentation. However I receive a timeout that seems to be caused by the fact that the request timeout from the hugging face CDN.

Pulling 15 files take times, is it possible to warmup the server and download all this before? Or keep the download instead of dropping them.

Thanks!

Docker command

docker run -it --rm \
  --gpus all  \
  -p 3000:8000 \
  -e HF_AUTH_TOKEN="XXX" \
  -e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
  -e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
  -e AWS_DEFAULT_REGION="$AWS_DEFAULT_REGION" \
  -e AWS_S3_ENDPOINT_URL="$AWS_S3_ENDPOINT_URL" \
  -e AWS_S3_DEFAULT_BUCKET="$AWS_S3_DEFAULT_BUCKET" \
  -v ~/root-cache:/root/.cache \
  "$@" gadicc/diffusers-api:latest

Request

{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "width": 512,
    "height": 512,
    "seed": 3239022079
  },
  "callInputs": {
    "MODEL_ID": "runwayml/stable-diffusion-v1-5",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true
  }
}

Server log


{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "width": 512,
    "height": 512,
    "seed": 3239022079
  },
  "callInputs": {
    "MODEL_ID": "runwayml/stable-diffusion-v1-5",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true
  }
}
download_model {'model_url': None, 'model_id': 'runwayml/stable-diffusion-v1-5', 'model_revision': None, 'hf_model_id': None}
loadModel {'model_id': 'runwayml/stable-diffusion-v1-5', 'load': False, 'precision': None, 'revision': None}
Downloading model: runwayml/stable-diffusion-v1-5
Fetching 15 files:   0%|                                                                                                                                                                       | 0/15 [00:00<?, ?it/s]
Downloading (…)"model.safetensors";:   0%|                                                                                                                                                | 0.00/1.22G [00:00<?, ?B/s]
Downloading (…)_model.safetensors";:   0%|                                                                                                                                                 | 0.00/335M [00:00<?, ?B/s]
Downloading (…)_model.safetensors";:   0%|                                                                                                                                                | 0.00/3.44G [00:00<?, ?B/s]
Downloading (…)"model.safetensors";:   0%|                                                                                                                                                 | 0.00/492M [00:00<?, ?B/s]

Downloading (…)"model.safetensors";:   2%|██▎                                                                                                                                   | 21.0M/1.22G [02:04<1:56:43, 171kB/s]
Downloading (…)_model.safetensors";:   3%|████▎                                                                                                                                    | 10.5M/335M [01:49<47:32, 114kB/s]
Downloading (…)_model.safetensors";:   0%|▍                                                                                                                                     | 10.5M/3.44G [01:38<7:39:24, 124kB/s]

Fetching 15 files:  20%|███████████████████████████████▊                                                                                                                               | 3/15 [02:59<11:58, 59.84s/it]
Downloading (…)"model.safetensors";:   2%|██▎                                                                                                                                   | 21.0M/1.22G [02:58<2:49:42, 117kB/s]
Downloading (…)"model.safetensors";:   2%|██▊                                                                                                                                   | 10.5M/492M [02:57<2:16:15, 58.9kB/s]
Downloading (…)_model.safetensors";:   0%|▍                                                                                                                                   | 10.5M/3.44G [02:58<16:10:00, 58.9kB/s]
Downloading (…)_model.safetensors";:   6%|████████▌                                                                                                                                | 21.0M/335M [02:58<44:24, 118kB/s]
[2023-01-31 16:17:45 +0000] - (sanic.access)[INFO][172.17.0.1:59646]: POST http://localhost:8000/  200 5470

Postman side

{
    "$error": {
        "code": "APP_INFERENCE_ERROR",
        "name": "ConnectionError",
        "message": "HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.",
        "stack": "Traceback (most recent call last):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 444, in _error_catcher\n    yield\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 567, in read\n    data = self._fp_read(amt) if not fp_closed else b\"\"\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 533, in _fp_read\n    return self._fp.read(amt) if amt is not None else self._fp.read()\n  File \"/opt/conda/envs/xformers/lib/python3.9/http/client.py\", line 463, in read\n    n = self.readinto(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/http/client.py\", line 507, in readinto\n    n = self.fp.readinto(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/socket.py\", line 704, in readinto\n    return self._sock.recv_into(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/ssl.py\", line 1242, in recv_into\n    return self.read(nbytes, buffer)\n  File \"/opt/conda/envs/xformers/lib/python3.9/ssl.py\", line 1100, in read\n    return self._sslobj.read(len, buffer)\nsocket.timeout: The read operation timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/requests/models.py\", line 816, in generate\n    yield from self.raw.stream(chunk_size, decode_content=True)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 628, in stream\n    data = self.read(amt=amt, decode_content=decode_content)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 593, in read\n    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)\n  File \"/opt/conda/envs/xformers/lib/python3.9/contextlib.py\", line 137, in __exit__\n    self.gen.throw(typ, value, traceback)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 449, in _error_catcher\n    raise ReadTimeoutError(self._pool, None, \"Read timed out.\")\nurllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/api/server.py\", line 39, in inference\n    output = user_src.inference(model_inputs)\n  File \"/api/app.py\", line 178, in inference\n    download_model(\n  File \"/api/download.py\", line 148, in download_model\n    loadModel(\n  File \"/api/loadModel.py\", line 59, in loadModel\n    model = pipeline.from_pretrained(\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 524, in from_pretrained\n    cached_folder = snapshot_download(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py\", line 215, in snapshot_download\n    thread_map(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/contrib/concurrent.py\", line 94, in thread_map\n    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/contrib/concurrent.py\", line 76, in _executor_map\n    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/std.py\", line 1195, in __iter__\n    for obj in iterable:\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 609, in result_iterator\n    yield fs.pop().result()\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 446, in result\n    return self.__get_result()\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 391, in __get_result\n    raise self._exception\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py\", line 194, in _inner_hf_hub_download\n    return hf_hub_download(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 1282, in hf_hub_download\n    http_get(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 530, in http_get\n    for chunk in r.iter_content(chunk_size=10 * 1024 * 1024):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/requests/models.py\", line 822, in generate\n    raise ConnectionError(e)\nrequests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.\n"
    }
}

gadicc commented 1 year ago

Hey @AllanElleuch, thanks for the detailed report.

Yes, the way you have it there with the volume setup (-v ~/root-cache:/root/.cache), it will only download the model once (and you'll be able to see them in ~/root-cache on the host system). Not sure what's up with HuggingFace, maybe they were / are having some temporary issues.

It's also possible to bundle the model with the container (as in https://github.com/kiri-art/docker-diffusers-api-build-download), but that's usually only useful for serverless, since you're able to persist the volume on your local machine.

But either way, you still have to download the model once (but only once).

If HuggingFace is playing up, you could try add the following callInput:

MODEL_URL="https://pub-bdad4fdd97ac4830945e90ed16298864.r2.dev/diffusers/models--runwayml--stable-diffusion-v1-5--fp16.tar.zst"

and it should download from kiri's public R2 storage (rate-limited).

Let me know how that goes.

AllanElleuch commented 1 year ago

Thanks Gadicc for the detailed feedback.

It seems that the issue was an unstable connection on my side. Now it is working perfectly fine and I was able to download all the files without an issue!

gadicc commented 1 year ago

Fantastic, thanks for reporting back, @AllanElleuch. Happy Diffusing 🧨 :tada:

kiri-art / docker-diffusers-api

Fetching files on requests cause timeout #31