iiab / iiab

Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
https://internet-in-a-box.org
GNU General Public License v2.0
941 stars 74 forks source link

Extremely slow downloads of IIAB Maps from archive.org (often 1-2 Mbit/s) - would torrents solve this - or is there a better way? [OSM] #2553

Closed holta closed 3 years ago

holta commented 3 years ago

The extreme slowness of downloads from archive.org (still as slow as 1-2 mbit/sec almost a year later) means that folks are unable to take advantage of IIAB Maps, when even a medium-sized Map Pack tediously can take all-day-and-night-and... to download.

(Or many days to download in the case of the 80GB "World" Map Pack!)

Is there a pragmatic solution here?

Would torrents (that Internet Archive already seeds) be one of those, e.g. could this every become a genuinely hassle-free solution, that idoesn't drive everyone crazy? i.e. without endless maintenance/onboarding and other hidden costs?

(Are there other options we haven't considered?)

holta commented 3 years ago

fyi the "World" Map Pack (80.48 GB) takes 3.7 days (more than half a week) to download at Archive.org's usual speed (2Mbit/s = 0.9 GByte/hour).

That's a best case scenario without network drops/interruptions (!)

i.e. I don't know if /usr/bin/iiab-install-map-region and /usr/bin/iiab-extend-sat.py recover gracefully after network outages, but possibly @georgejhunt might know: https://github.com/iiab/iiab/pull/2551#issuecomment-700342446

holta commented 3 years ago

Of course things could be Even Worse (:

i.e. in comparison, iiab-extend-sat.py sustains about 0.33 Mbit/s as it downloads many small files (containing satellite photos) if we measure incremental data (MB) based on the growth of its /library/www/osm-vector-maps/viewer/tiles/satellite_z0-z9_v3.mbtiles database file.

That number appears to be pretty consistently between 0.32 and 0.35Mbit/s when downloading 3 different Hi-Res Satellite Photo Regions (2 squares of 100x100 km, and 1 square of 300x300 km).

(So in effect: iiab-extend-sat.py download speeds appear to be about 6X slower — as compared to iiab-install-map-region which generally/often downloads from Archive.org at about 2Mbit/s.)

ADDENDUM: Download speed was 0.27 MBit/s over 110 min, when downloading the 1st (third? half?) of a 1000x1000 Hi-Red Satellite Photo Region.

tim-moody commented 3 years ago

out of curiosity what speed do you get from http://timmoody.com/iiab-files/maps/osm-planet_z0-z10_2019.mbtiles

holta commented 3 years ago

out of curiosity what speed do you get from http://timmoody.com/iiab-files/maps/osm-planet_z0-z10_2019.mbtiles

Bursting to ~100 Mbit/s but the avg (to download the entire 1.74 GB file in 2m54s) was: 80 Mbit/s

holta commented 3 years ago

ADDENDUM: Download speed was 0.27 MBit/s over 110 min, when downloading the 1st (third? half?) of a 1000x1000 Hi-Red Satellite Photo Region.

Total time to download 1000x1000 km Hi-Res Satellite Photo Region (343MB) was 212min, for an average download speed of:

0.22 Mbit/s

"Party like it's 1999" (ISDN speeds are better than 56kbit/s analog modems after all ;)

georgejhunt commented 3 years ago

The satellite tiles are rate limited by the source (it's a free service, but much in demand). I've not seen more than 5 tiles/sec. That's why I have never wanted to download a whole region.

On Wed, Sep 30, 2020 at 8:32 AM A Holt notifications@github.com wrote:

ADDENDUM: Download speed was 0.27 MBit/s over 110 min, when downloading the 1st (third? half?) of a 1000x1000 Hi-Red Satellite Photo Region.

Total time to download 1000x1000 km Hi-Res Satellite Photo Region (343MB) was 212min, for an average download speed of:

0.22 Mbit/s

"Party like it's 1999" (ISDN speeds are better than 56kbit/s analog modems after all ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/iiab/iiab/issues/2553#issuecomment-701467488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHGWGVQPNUIUF5VCZDTSINFQLANCNFSM4R6S7FZQ .

holta commented 3 years ago

Sufficiently solved for IIAB 7.2 thanks to http://timmoody.com/iiab-files/maps and PRs iiab/maps#38 and #2565.

If a broader community peering scheme (and its various mother hens to orchestrate truly reliable operation) can be engineered in time for IIAB 8.0, that has the potential to deliver not just maps — but also other critical educational content...

tabbyrobin commented 3 years ago

Cheers to @tim-moody for hosting.

Long-term, maybe software like IPFS or Dat (hyperdrive) could be useful? I think they might have a few advantages over/make things easier than using regular torrents. I think dat/hyperdrive is more stable but IPFS featureset might be more adapted to this scenario.

(Not sure if above mention "that Internet Archive already seeds" was saying IA already does seed those torrents, or that there's hope they might start in the future...)

Perhaps IIAB installs could have an opt-in feature to seed what they have (if they are on an appropriate connection).

Apologies if this is already being looked at. I saw on the linked commits that it mentioned @georgejhunt is already looking at torrent mechanisms, but didn't see any public details/discussion.

holta commented 3 years ago

@sptankard feel free to join our call 10AM NYC Time tmrw (Thur) if you want to talk to @georgejhunt directly about best practices and recommendations you have for content peering? To help inform his ongoing design process...coordinating with Internet Archive if possible, yes!

Call agenda & informal minutes here:

http://minutes.iiab.io

I apologize IIAB's weekly calls are currently on Skype (!) but if you want to join, do just send me your Skype username to holt @ unleashkids org

holta commented 3 years ago

@sptankard please see @georgejhunt's BitTorrent prototype (proof-of-concept) here:

PR #2572

Please let us know what you think, even if you cannot join our voice call! (Thur 10AM EDT / NYC Time!)

tabbyrobin commented 3 years ago

Hi @holta , thank you! I wasn't able to make it to the meeting but I did have a look at @georgejhunt's PR. It looks like George knows what he is doing a lot more than i do. :)

I noticed George's PR uses torrent files which are hosted by IA and which use the IA bittorrent seeders at bt1.archive.org and bt2.archive.org. That seems really nice and so it seems (to me) like bittorrent is probably the best solution for now, and probably for a while.

Btw i did test one of the torrents (the first one in the json), manually with transmission-gtk, and downloading was pretty fast. (It was estimating 1-2 hours but probably would have been faster. I stopped it at ~10%.)

I think on the whole IPFS is not super stable at the moment, and especially, IA's IPFS setup is not stable like their torrenting setup.

I tried to look into what support IA has for IPFS. Couldn't find much written, but apparently IPFS (but not dat/hyperdrive) is one of the protocols supported on their dweb site. When i pull up one of the IIAB maps on their dweb.archive.org site, it says dweb is not supported for that filetype: Unsupported mediatype:software https://dweb.archive.org/details/osm_africa_z11-z14_2019.mbtiles

When/if IA gets more IPFS support, the main benefits i'm thinking that it might give are:

How much these benefits are realized will depend on the nature of the files downloaded, and of course how stable IPFS implementation is.

holta commented 3 years ago

@mitra42 can you comment on Internet-in-a-Box's proposed content peering strategy, as it emerges alongside (and thanks to!) Internet Archive's BitTorrent seeds etc? Thanks if so! See also: PR #2572

Hi @holta , thank you! I wasn't able to make it to the meeting but I did have a look at @georgejhunt's PR. It looks like George knows what he is doing a lot more than i do. :)

I noticed George's PR uses torrent files which are hosted by IA and which use the IA bittorrent seeders at bt1.archive.org and bt2.archive.org. That seems really nice and so it seems (to me) like bittorrent is probably the best solution for now, and probably for a while.

Btw i did test one of the torrents (the first one in the json), manually with transmission-gtk, and downloading was pretty fast. (It was estimating 1-2 hours but probably would have been faster. I stopped it at ~10%.)

I think on the whole IPFS is not super stable at the moment, and especially, IA's IPFS setup is not stable like their torrenting setup.

I tried to look into what support IA has for IPFS. Couldn't find much written, but apparently IPFS (but not dat/hyperdrive) is one of the protocols supported on their dweb site. When i pull up one of the IIAB maps on their dweb.archive.org site, it says dweb is not supported for that filetype: Unsupported mediatype:software https://dweb.archive.org/details/osm_africa_z11-z14_2019.mbtiles

When/if IA gets more IPFS support, the main benefits i'm thinking that it might give are:

  • incremental updates when getting new versions of files (only pull changes/deltas)
  • save some local space thru auto deduplication
  • only download diffs if some files have overlapping content

How much these benefits are realized will depend on the nature of the files downloaded, and of course how stable IPFS implementation is.

mitra42 commented 3 years ago

Hi @sptankard. I built the dweb.archive.org site, I'm still the point of contact and try and keep it alive though I havent been working on it much since end of 2018 when switched to do the offline archive project (which is where the distribution in IIAB comes from).

The "Unsupported mediatype: software" that https://dweb.archive.org/details/osm_africa_z11-z14_2019.mbtiles gives you is because the dweb site doesn't run the emulators that play old native software in a browser, and that is what the "software" mediatype is generally used for. I agree it would be better just to display the files, but that isn't going to happen.

The torrent files on the Archive can be problematic, but because there are so 50 million of them, its non-trivial to get that changed. https://dweb.archive.org/download/osm_africa_z11-z14_2019.mbtiles/osm_africa_z11-z14_2019.mbtiles_archive.torrent Should give you a better torrent (just put "dweb" before the URL to download the torrent file otherwise) … this torrent file is fixed up to use wt.archive.org as the tracker, that tracker supports https and wss and so may be faster. Also it works with browser based approaches like webtorrent.

IPFS is supported only somewhat, its subject to a lot of "bit-rot" (changing API's, servers that can't handle the load etc etc) and so with no-one maintaining it regularly at the Archive it tends to fail and stay down. I would NOT recommend depending on IPFS for file distribution - in particular because the IPFS installer itself breaks as APIs etc change, so we have had great difficulties with previously working installer (for our Docker files at dweb.archive.org) failing as API's etc change.

I would recommend sticking with Bittorrent. When used via the torrent file I recommend you should find there is always at least one machine (the archive) serving the content but If you want to make it work well, make sure there are a few machines around the world that have themselves fetched copies and stay up as much as possible, (this could be as simple as having a couple of machines running webtorrent or utorrent that have already fetched the distributions).

1265578519 commented 2 years ago

https://blog.archive.org/2020/05/11/thank-you-for-helping-us-increase-our-bandwidth/