opentraveldata / opentraveldata

Collection of open data related to (at least) travel, transport, tourism
https://opentraveldata.github.io/opentraveldata/
231 stars 98 forks source link

Missing POR in Vietnam #102

Open da115115 opened 5 years ago

da115115 commented 5 years ago

From the latest UN/LOCODE data, as of now dated in July 2018, 238 POR have a UN/LOCODE in Vietnam (VN):

$ bzgrep '^,"VN"' ../unlocode/unlocode-code-list-2018-1.csv.bz2 |wc -l
239
$ bzgrep '^,"VN"' ../unlocode/unlocode-code-list-2018-1.csv.bz2 | head -3
,"VN",,".VIET NAM",,,,,,,,
,"VN","AGG","An Giang","An Giang","44","--3-----","RQ","0901",,,
,"VN","ANP","An Phú","An Phu","44","--3-----","RL","1401",,"1051N 10505E",
$ bzgrep '^,"VN"' ../unlocode/unlocode-code-list-2018-1.csv.bz2 |grep -e "UIH" -e "VKG"
,"VN","UIH","Qui Nhon","Qui Nhon","31","1-34----","AI","1301",,"1346N 10913E",""
,"VN","VKG","Rach Gia","Rach Gia",,"---4----","RQ","0901",,,

Whereas the opentraveldata/optd_por_public_all.csv data file contains at most 213 POR, among which 69 POR have a IATA code, among which 38 are distinct IATA code (that is, only 183 POR may have a distinct UN/LOCODE):

$ awk -F'^' '{if ($17 == "VN") {print ($0)}}' optd_por_public_all.csv |wc -l
213
$ awk -F'^' '{if ($17 == "VN" && $1 != "") {print ($0)}}' optd_por_public_all.csv |wc -l
69
$ awk -F'^' '{if ($17 == "VN" && $1 != "") {print ($1)}}' optd_por_public_all.csv | sort | uniq | wc -l
38

Hence, roughly 50 POR are referenced with UN/LOCODE, which do not appear in Geonames (and hence OPTD). That issue tracks the work to fill that gap.

da115115 commented 5 years ago

Added a section in the main README explaining how to download and transform the latest UN/LOCODE data files.