Open fdietze opened 2 months ago
Interesting, I can repro this. Thanks
~Same thing, in Safari and wget, today, with https://ton.twimg.com/birdwatch-public-data/2024/09/21/notes/notes-00000.tsv
% wget https://ton.twimg.com/birdwatch-public-data/2024/09/21/notes/notes-00000.tsv --2024-09-21 19:39:33-- https://ton.twimg.com/birdwatch-public-data/2024/09/21/notes/notes-00000.tsv Resolving ton.twimg.com (ton.twimg.com)... 152.199.24.184 Connecting to ton.twimg.com (ton.twimg.com)|152.199.24.184|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 611776258 (583M) [text/tab-separated-values] Saving to: ‘notes-00000.tsv’
notes-00000.tsv 32%[===================================> ] 190.74M --.-KB/s eta 3m 51s
(BUT, at least wget does sort-of-work/fail gracefully, eventually:
2024-09-21 19:43:08 (912 KB/s) - Connection closed at byte 200002828. Retrying.
--2024-09-21 19:43:09-- (try: 2) https://ton.twimg.com/birdwatch-public-data/2024/09/21/notes/notes-00000.tsv Connecting to ton.twimg.com (ton.twimg.com)|152.199.24.184|:443... connected. HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
The file is already fully retrieved; nothing to do.)
Describe the bug When downloading the datasets from https://x.com/i/communitynotes/download-data using
wget
, it hangs, not receiving more data, because the content-length header is too big (566M
) for the file being served (185M
).To Reproduce
Expected behavior The content-length header should be set to the file size.