Closed akosiaris closed 2 years ago
Ι'm so sorry for the late reply! Unfortunately, I just figured out that this issue had been opened. The geolocation is being automated, feeding the code the area, country, and address provided by EODY. I'll double-check what's going on with the code and will get back to you ASAP.
Hi,
First, let me thank you profoundly for publishing this clearly painstakingly constructed data set out in the open and keeping it updated. It's been very useful as an alternative to the non machine readable, limited official releases.
As for this issue, I 've been playing around a bit with the
rapid_tests
part of the dataset recently. While creating a geographical visualization of the dataset, I noticed that many of the data points are well outside the geographical limits of Greece, some even on the Western hemisphere of the world. A quick screenshot exhibits that belowI verified this by grepping through the dataset for the points that are in the Western hemisphere (just because they are extremely easy to search for, they all have a
,-<number>
textual pattern, i.e.So 169 datapoints are in the wrong hemisphere. Partially deduplicating based on the actual place (e.g. "ΡΟΔΟΥ, ΡΟΔΟΥ, ΠΛΑΤΕΙΑ ΣΑΝ ΦΡΑΤΖΕΣΚΟ") gives us 58 distinct locations
with the first 10 in order of datapoints being the following
As you can see there is still some duplication based on whether there is an accent or not in some words (e.g. Thessaloniki port which, interestingly, depending on whether "ΛΙΜΑΝΙ" is accented or not has different geographical coordinates), but that's arguably the lesser of the problems.
I have done 0 work to identify datapoints in the Eastern hemisphere that are wrong as they require a slighly more involved approach of making sure that latitude and longitude are within the administrative geographical boundaries of Greece. But as you can tell by the screenshot, there are datapoints in Cyprus, Egypt, Italy, Hungary, Turkey and Germany, all of them clearly not correct. There is also a sizable amount of datapoints of the Gulf of Guinea in Africa, but that's presumably because geographical coordinate discovery failed and returned latitude and longitude of 0.0,0.0
I have no idea how the latitude and longitude are generated and whether they are part of the original dataset or are secondary data, so I don't know if it is fixable.
In any case, I thought I should let you know.
Many thanks again!