thadk / osm-hashtag-extract

A Makefile flow to generate geojson and CSV extracts of the OpenStreetMap planet changes & HOT OSM Tasks by a hashtag
1 stars 0 forks source link

error when downloading planet_latest #2

Open d3netxer opened 7 years ago

d3netxer commented 7 years ago

This is sort of a weird error I'm getting, know why it appears?

command: make data/osm/planet_latest.osm

mkdir -p data/osm/
curl http://planet.openstreetmap.org/planet_latest.osm.bz2 | pbzip2 -cd >data/osm/planet_latest.osm.download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   309  100   309    0     0    987      0 --:--:-- --:--:-- --:--:--   990
pbzip2: *ERROR: Bad magic number (file not created by bzip2)!  Skipping...
Terminator thread: premature exit requested - quitting...
Makefile:22: recipe for target 'data/osm/planet_latest.osm' failed
make: *** [data/osm/planet_latest.osm] Error 1
thadk commented 7 years ago

Here are my pbzip2 and curl versions. Looks like I am using the stock El Capitan OS X version of the latter. I'm running this command on my machine again now to see if I run into this now as well.

pbzip2 --version

Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward
curl -V

curl 7.43.0 (x86_64-apple-darwin15.0) libcurl/7.43.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
d3netxer commented 7 years ago

I ran command on an Ubuntu 16.04 virtual machine. here are the versions of pbzip2 and curl:

pbzip2 --version
Parallel BZIP2 v1.1.9     - by: Jeff Gilchrist [http://compression.ca]
[Apr. 13, 2014]               (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov <nikolov.javor+pbzip2@gmail.com>
curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets 
d3netxer commented 7 years ago

when I manually run a curl command to download the planet file, it doesn't work

curl -O http://planet.osm.org/planet/planet-latest.osm.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   319  100   319    0     0    428      0 --:--:-- --:--:-- --:--:--   429

however using curl to download the latest changeset file does work:

curl -O http://planet.osm.org/planet/changesets-latest.osm.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 11 1594M   11  177M    0     0  23.8M      0  0:01:06  0:00:07  0:00:59 28.5M
d3netxer commented 7 years ago

I did use: wget http://planet.osm.org/planet/planet-latest.osm.bz2

and it was successful.

Does the Makefile make use of the planet file yet?

thadk commented 7 years ago

The path has changed from http://planet.openstreetmap.org/planet_latest.osm.bz2 to http://planet.openstreetmap.org/planet/changesets-latest.osm.bz2

thadk commented 7 years ago

The two commands piped together just do optimized decompression of the planetfile while it is downloading (since both take a long time). It was recommended on the OSM wiki. You can do the two steps separately, no problem as long as the final file ends up in the same place.

Once I updated the makefile with the new URL of the planetfile, it seems to be working better. I'll commit it once I confirm it finishes.

d3netxer commented 7 years ago

ok, my follow-up question. Is the OSM planet file to be decompressed? Or can we work on it while keeping it in either bz2 or pbf? ChangesetMD was able to work with the bz2 file using the bz2file library.

The reason while I ask is because I tried to extract it after downloaded it using pbzip2 and it wasn't able to complete, because it grew over 650 gb and filled the entire hard drive of my new computer!

thadk commented 7 years ago

Oh sorry, I just realized there was actually a planet file entry in the Makefile, but it was completely unused. Sorry, forgot about that! I removed that entry now--dangerous.