konradkalemba / korona.ws

馃椇 Coronavirus interactive map of Poland
https://korona.ws
73 stars 29 forks source link

Data source #27

Closed konradkalemba closed 4 years ago

konradkalemba commented 4 years ago

Hi all!

Currently the data is updated manually from MZ Twitter account. However, in the long run this approach is not effective.

I found the official website - https://www.gov.pl/web/koronawirus/wykaz-zarazen-koronawirusem-sars-cov-2 where we can get the data from automatically. Their data are a bit inconsistent though - there are cities specified in some cases, in others there are "powiaty".

There is one more source we scrap the data from - https://docs.google.com/spreadsheets/d/1ierEhD6gcq51HAm433knjnVwey4ZE5DCnu1bW7PRG3E/htmlview?usp=sharing&sle=true

Any thoughts?

mhajder commented 4 years ago

For me, the idea of scraping official data is a good one. But is it sense to scrape them to create the same map? I would stay with the official account of the MZ twitter.

I like the data with cities more. They are more readable.

You would also need to prepare the same map as it is on gov.pl, or use coordinate location by place name.

JSON data from gov.pl ```json [ { "Wojew贸dztwo":"Ca艂a Polska", "Powiat/Miasto":"艁膮cznie", "Liczba":"156", "Liczba zgon贸w":"3", "Id":"t0000" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"be艂chatowski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1001" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"b臋dzi艅ski", "Liczba":"2", "Liczba zgon贸w":"", "Id":"t2401" }, { "Wojew贸dztwo":"lubelskie", "Powiat/Miasto":"bi艂gorajski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0602" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"boles艂awiecki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0201" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"Chorz贸w", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2463" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"cieszy艅ski", "Liczba":"4", "Liczba zgon贸w":"", "Id":"t2403" }, { "Wojew贸dztwo":"pomorskie", "Powiat/Miasto":"Gda艅sk", "Liczba":"2", "Liczba zgon贸w":"", "Id":"t2261" }, { "Wojew贸dztwo":"lubuskie", "Powiat/Miasto":"Gorz贸w Wielkopolski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0861" }, { "Wojew贸dztwo":"lubelskie", "Powiat/Miasto":"janowski", "Liczba":"5", "Liczba zgon贸w":"", "Id":"t0605" }, { "Wojew贸dztwo":"opolskie", "Powiat/Miasto":"k臋dzierzy艅sko-kozielski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1603" }, { "Wojew贸dztwo":"艣wi臋tokrzyskie", "Powiat/Miasto":"Kielce", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2661" }, { "Wojew贸dztwo":"ma艂opolskie", "Powiat/Miasto":"Krak贸w", "Liczba":"2", "Liczba zgon贸w":"", "Id":"t1261" }, { "Wojew贸dztwo":"opolskie", "Powiat/Miasto":"krapkowicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1605" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"Legnica", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0262" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"legnicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0209" }, { "Wojew贸dztwo":"podkarpackie", "Powiat/Miasto":"le偶ajski", "Liczba":"7", "Liczba zgon贸w":"", "Id":"t1808" }, { "Wojew贸dztwo":"lubelskie", "Powiat/Miasto":"lubartowski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0608" }, { "Wojew贸dztwo":"lubelskie", "Powiat/Miasto":"Lublin", "Liczba":"9", "Liczba zgon贸w":"1", "Id":"t0663" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"艂贸dzki wschodni", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1006" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"艁贸d藕", "Liczba":"14", "Liczba zgon贸w":"", "Id":"t1061" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"ole艣nicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0214" }, { "Wojew贸dztwo":"warmi艅sko-mazurskie", "Powiat/Miasto":"Olsztyn", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t2862" }, { "Wojew贸dztwo":"艣wi臋tokrzyskie", "Powiat/Miasto":"ostrowiecki", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t2607" }, { "Wojew贸dztwo":"warmi艅sko-mazurskie", "Powiat/Miasto":"ostr贸dzki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2815" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"pabianicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1008" }, { "Wojew贸dztwo":"mazowieckie", "Powiat/Miasto":"piaseczy艅ski", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t1418" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"podd臋bicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1011" }, { "Wojew贸dztwo":"zachodniopomorskie", "Powiat/Miasto":"policki", "Liczba":"2", "Liczba zgon贸w":"", "Id":"t3211" }, { "Wojew贸dztwo":"wielkopolskie", "Powiat/Miasto":"Pozna艅", "Liczba":"7", "Liczba zgon贸w":"1", "Id":"t3064" }, { "Wojew贸dztwo":"wielkopolskie", "Powiat/Miasto":"pozna艅ski", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t3021" }, { "Wojew贸dztwo":"mazowieckie", "Powiat/Miasto":"pruszkowski", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t1421" }, { "Wojew贸dztwo":"mazowieckie", "Powiat/Miasto":"Radom", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1463" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"Rybnik", "Liczba":"5", "Liczba zgon贸w":"", "Id":"t2473" }, { "Wojew贸dztwo":"podkarpackie", "Powiat/Miasto":"Rzesz贸w", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1863" }, { "Wojew贸dztwo":"lubuskie", "Powiat/Miasto":"s艂ubicki", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0805" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"Sosnowiec", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2475" }, { "Wojew贸dztwo":"zachodniopomorskie", "Powiat/Miasto":"stargardzki", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t3214" }, { "Wojew贸dztwo":"opolskie", "Powiat/Miasto":"strzelecki", "Liczba":"4", "Liczba zgon贸w":"", "Id":"t1611" }, { "Wojew贸dztwo":"zachodniopomorskie", "Powiat/Miasto":"Szczecin", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t3262" }, { "Wojew贸dztwo":"warmi艅sko-mazurskie", "Powiat/Miasto":"szczycie艅ski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2817" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"tarnog贸rski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2413" }, { "Wojew贸dztwo":"lubelskie", "Powiat/Miasto":"tomaszowski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0618" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"tomaszowski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t1016" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"Tychy", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t2477" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"wa艂brzyski", "Liczba":"1", "Liczba zgon贸w":"", "Id":"t0221" }, { "Wojew贸dztwo":"mazowieckie", "Powiat/Miasto":"Warszawa", "Liczba":"24", "Liczba zgon贸w":"", "Id":"t1465" }, { "Wojew贸dztwo":"dolno艣l膮skie", "Powiat/Miasto":"Wroc艂aw", "Liczba":"18", "Liczba zgon贸w":"1", "Id":"t0264" }, { "Wojew贸dztwo":"艣l膮skie", "Powiat/Miasto":"zawiercia艅ski", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t2416" }, { "Wojew贸dztwo":"艂贸dzkie", "Powiat/Miasto":"zgierski", "Liczba":"3", "Liczba zgon贸w":"", "Id":"t1020" } ] ```
mulawamichal commented 4 years ago

there are cities specified in some cases, in others there are "powiaty".

sometimes powiat is named same as city. theese are "miasta na parawach powiatu". see https://pl.wikipedia.org/wiki/Lista_powiat%C3%B3w_w_Polsce or TERYT database (http://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default)

konradkalemba commented 4 years ago

@mhajder There is also an option to stay with manually updated data, but we would have to create a team responsible for data updating, because right now I'm not available all the time. With more people, delays in data update would be lower.

@mulawamichal I see - that's good, we wouldn't have to deal with showing "powiat" on map.

In both approaches there are some trade-offs:

mhajder commented 4 years ago

@konradkalemba I can help with adding data.

Implementing such a scraper is very simple. It's best to do a simple python script that will be run in cron. Also, page scraping is not very ethical. It generates a lot of traffic.

konradkalemba commented 4 years ago

@mhajder I know that isn't the best way ethical-wise, but it wouldn't generate a lot of traffic though. Running the script every 5 minutes wouldn't hurt the server very much.

An another problem with this data source is that I'm not sure if it's updated regularly

konradkalemba commented 4 years ago

Okay, for the time being we are staying with the data updated manually. Official MZ's website was outdated for at least 1 hour after the latest confirmation.

mhajder commented 4 years ago

Okay, for the time being we are staying with the data updated manually. Official MZ's website was outdated for at least 1 hour after the latest confirmation.

Probably the change are only during working hour 馃槅

konradkalemba commented 4 years ago

There is an another problem - MZ's Twitter doesn't specify cities where new cases are, only voivodeships now...

mulawamichal commented 4 years ago

geoportal now has "koronawirus" layer: https://mapy.geoportal.gov.pl/imap/Imgp_2.html?locale=en&gui=new&sessionID=4955220

mhajder commented 4 years ago

geoportal now has "koronawirus" layer: https://mapy.geoportal.gov.pl/imap/Imgp_2.html?locale=en&gui=new&sessionID=4955220

But it is extremely difficult to scrape it.

For gov.pl, all you need is:

import json

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.gov.pl/web/koronawirus/wykaz-zarazen-koronawirusem-sars-cov-2')

soup = BeautifulSoup(response.text, 'html.parser')

json_data = json.loads((soup.find(id='registerData')).text)
print(json_data['parsedData'])
konradkalemba commented 4 years ago

@mulawamichal @mhajder Guys, we have a big problem - their official twitter account lists only voivodeships now as I wrote above, but I thought maybe their website will be more precise... but they limited both map and table to voivodeships.

mhajder commented 4 years ago

@konradkalemba Gov.pl also now only provides voivodships.

konradkalemba commented 4 years ago

@mhajder Yes... I think we also have to do so. Because trying to find where every new case is will be very time-consuming.

mhajder commented 4 years ago

Also correct the case that the patient recovered, as it is written on Twitter this is the first patient. So it is from Zielona G贸ra.