phinik / LovelyLLamas

0 stars 0 forks source link

20.11 Goal 3 (Pre-Requisite) - Data Collection Issues #13

Closed Ace-Of-Snakes closed 3 hours ago

Ace-Of-Snakes commented 3 hours ago

After enlarging the Database of cities towards 113k cities a new approach is needed for crawling them every day.

Ace-Of-Snakes commented 3 hours ago

The finalised approach was to use python libraries like pytz to rescrape the 113k Datapoint for latitude and longitude and get their timezones. (took about 4hrs) After this was finished, the UTC deviation was calculated for each timezone. (For example Berlin = UTC+1). The big DataBase was split on integer values of UTC deviation -> [UTC - 10 : UTC + 14], creating roughly 24 new CSVs. The Concurrent-Script was rewritten to detect which UTC Timezone has the time 00:00 (important for data reasons) and is run every hour accordingly on the server.