huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
696 stars 78 forks source link

Cannot get images from mnist #318

Closed severo closed 2 years ago

severo commented 2 years ago

On https://huggingface.co/datasets/mnist, the images do not appear:

Capture d’écran 2022-05-30 à 12 45 19

And the requests to the images return 403 or 404:

Capture d’écran 2022-05-30 à 12 45 25

Their URLs are like:

https://huggingface.co/proxy-datasets-preview/assets/mnist/--/mnist/train/91/image/image.jpg ^does not work

Which should proxy to upstream URL:

https://datasets-server.huggingface.tech/assets/mnist/--/mnist/train/91/image/image.jpg ^works

See the nginx configuration: https://github.com/huggingface/conf/blob/bd698a91c615938b52477c25d72ba84d10af4c68/moonrise/nginx-moonrise.conf#L321-L328

Looking at the nginx logs on moonrise (sudo grep proxy-datasets-preview /var/log/nginx/error.log) we get a lot of Connection timed out errors:

2022/05/30 12:41:41 [error] 687523#687523: *867222115 upstream timed out (110: Connection timed out) while connecting to upstream, client: 172.30.1.33, server: huggingface.co, request: "GET /proxy-datasets-preview/assets/mnist/--/mnist/train/91/image/image.jpg HTTP/1.1", upstream: "https://35.175.164.194:443/assets/mnist/--/mnist/train/91/image/image.jpg", host: "huggingface.co"

This means that moonrise does not seem able to access the datasets-server.huggingface.co server.

Launching curl from the moonrise server with the domain works:

curl https://datasets-server.huggingface.tech/assets/mnist/--/mnist/train/91/image/image.jpg > image.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   561  100   561    0     0  31166      0 --:--:-- --:--:-- --:--:-- 31166

But not with the IP reported in the logs (it timeouts):

hf@moonrise:/tmp$ curl https://35.175.164.194:443/assets/mnist/--/mnist/train/91/image/image.jpg > image.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:01:10 --:--:--     0

The IP resolved for datasets-server.huggingface.tech:

hf@moonrise:/tmp$ dig datasets-server.huggingface.tech

; <<>> DiG 9.16.1-Ubuntu <<>> datasets-server.huggingface.tech
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48677
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;datasets-server.huggingface.tech. IN   A

;; ANSWER SECTION:
datasets-server.huggingface.tech. 13 IN A       34.194.63.218
datasets-server.huggingface.tech. 13 IN A       52.204.14.32
datasets-server.huggingface.tech. 13 IN A       184.72.186.69
datasets-server.huggingface.tech. 13 IN A       50.16.88.70
datasets-server.huggingface.tech. 13 IN A       34.236.116.183
datasets-server.huggingface.tech. 13 IN A       34.239.243.182

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon May 30 12:52:46 CEST 2022
;; MSG SIZE  rcvd: 157
severo commented 2 years ago

OK: the issue seems to be that nginx is using 35.175.164.194 as the IP for datasets-server.huggingface.tech, while it's not one of the IPs associated with the domain, as per dig datasets-server.huggingface.tech

severo commented 2 years ago

The 4 IP that generate errors:

hf@moonrise:/tmp$ sudo grep proxy-datasets-preview /var/log/nginx/error.log | cut -d ' ' -f26 | awk -F[/:] '{print $4}' | sort | uniq
34.198.168.221
34.226.161.86
35.175.164.194
54.87.195.162

They are used by nginx, but are not associated to the domain (anymore?) It seems like the DNS cache used by nginx is outdated.

severo commented 2 years ago

It cannot be due to a long TTL on the domain, since the TTL is only 60s

severo commented 2 years ago

The immediate solution is to reload nginx:

Capture d’écran 2022-05-30 à 13 32 32

A better mid-term solution is to "do it directly at alb level like tensorboard"

See the discussion on Slack (https://huggingface.slack.com/archives/C023JAKTR2P/p1653909274352989) with @XciD

XciD commented 2 years ago

We have multiple solutions:

severo commented 2 years ago

I created a monitor on BetterUptime: https://betteruptime.com/team/14149/monitors/691070

severo commented 2 years ago

@huggingface/moon-landing-back : what do you think of https://github.com/huggingface/datasets-server/issues/318#issuecomment-1141049697?

julien-c commented 2 years ago

BTW we probably want to expose that service publicly in the future anyways, no?

severo commented 2 years ago

Yes, I think so. Maybe with authentication for services like random access/queries to datasets?

julien-c commented 2 years ago

IIRC the proxy was to serve the images and other assets from hf.co for SEO experiment but I don't think it's super crucial, and the service API is going to become public anyways UIUC

severo commented 2 years ago

We could use datasets-server.huggingface.co (or datasets.huggingface.co ?) then, to be able to get the cookies?

julien-c commented 2 years ago

datasets-server.huggingface.co sounds good to me. @lhoestq ?

lhoestq commented 2 years ago

Sounds good to me as well

severo commented 2 years ago

OK, closing, since it's temporarily fixed, and since the proxy will soon disappear (see #319)