voxel51 / eta

ETA: Extensible Toolkit for Analytics
https://voxel51.com
Apache License 2.0
29 stars 13 forks source link

`eta.core.web.download_file()` failing #620

Closed rohis06 closed 8 months ago

rohis06 commented 8 months ago

Of late, I have noticed that eta.core.web.download_file() is failing while downloading large files. For example: eta.core.web.download_file("http://data.csail.mit.edu/places/places365/test_256.tar", path=<my-path>)

This especially happens when the file is being downloaded in multiple chunks.

Output using eta.core.web.download_file():

Downloading test split from http://data.csail.mit.edu/places/places365/test_256.tar to /Users/<user>/fiftyone/places/test/data
  23% |█████████████████████████/--------------------------------------------------------------------------------------|    8.0Gb/35.3Gb [6.1m elapsed, 20.3m remaining, 23.9Mb/s]

Here's the wget output for the same:

wget http://data.csail.mit.edu/places/places365/test_256.tar
--2024-02-15 00:09:49--  http://data.csail.mit.edu/places/places365/test_256.tar
Resolving [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))... 128.52.131.233
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://data.csail.mit.edu/places/places365/test_256.tar [following]
--2024-02-15 00:09:49--  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4736829440 (4.4G) [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    22%[=====================>                                                                              ]   1.01G  11.4MB/s    in 90s
2024-02-15 00:11:20 (11.5 MB/s) - Connection closed at byte 1085026092. Retrying.
--2024-02-15 00:11:21--  (try: 2)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 3651803348 (3.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    45%[++++++++++++++++++++++======================>                                                       ]   2.02G  14.0MB/s    in 90s
2024-02-15 00:12:52 (11.4 MB/s) - Connection closed at byte 2170783988. Retrying.
--2024-02-15 00:12:54--  (try: 3)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 2566045452 (2.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    68%[+++++++++++++++++++++++++++++++++++++++++++++======================>                                ]   3.03G  10.9MB/s    in 96s
2024-02-15 00:14:31 (10.8 MB/s) - Connection closed at byte 3256345900. Retrying.
--2024-02-15 00:14:34--  (try: 4)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 1480483540 (1.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    91%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++======================>         ]   4.04G  14.1MB/s    in 94s
2024-02-15 00:16:08 (11.0 MB/s) - Connection closed at byte 4342093292. Retrying.
--2024-02-15 00:16:12--  (try: 5)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 394736148 (376M) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                   100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++========>]   4.41G  13.4MB/s    in 31s
2024-02-15 00:16:44 (12.2 MB/s) - ‘test_256.tar’ saved [4736829440/4736829440]

Any help here would be appreciated.

swheaton commented 8 months ago

Sorry, can you explain the failure? Does it get stuck at 23%? From your screenshot it just looks like it's in progress.

rohis06 commented 8 months ago

Yes, that's correct! It essentially terminates at 23%, and the control falls to the following statement in the code.

rohis06 commented 8 months ago

@swheaton, kindly let me know if you need any other details to debug the issue.

swheaton commented 8 months ago

@rohis06 it seems that we just don't handle a 206 partial content response. I don't know if there is a simple resolution to it. If you'd like to look into supporting this mode of operation within eta.core.web.download_file() that would be awesome and the fastest path to resolution! Otherwise it seems too niche of a use case to be prioritized for the core team, unfortunately.

I believe you are trying to add the Places dataset to the fiftyone zoo? (Appreciate it!) Perhaps @jacobmarks has ideas for workarounds to this issue with downloading the data?

rohis06 commented 8 months ago

@swheaton, that makes sense. I'd be glad to explore how to support this mode of operation within eta.core.web.download_file()!

Yes, I'm attempting to add the Places dataset to the fiftyone Zoo! :) Certainly, I'll reach out to @jacobmarks. One thing I'd like to bring to your attention is that the failure to download the Places dataset doesn't occur consistently. It only happens when the internet speed is <150Mbps. Otherwise, the download proceeds smoothly.

brimoor commented 8 months ago

Resolved by https://github.com/voxel51/eta/pull/621