openZH / covid_19

COVID19 case numbers of Cantons of Switzerland and Principality of Liechtenstein (FL). The data is updated at best once a day (times of collection and update may vary). Start with the README.
https://www.zh.ch/de/gesundheit/coronavirus/zahlen-fakten-covid-19.zhweb-noredirect.zhweb-cache.html?keywords=covid19&keyword=covid19#/
Creative Commons Attribution 4.0 International
424 stars 176 forks source link

Python scraper, if no value then please output "int" #272

Closed zdavatz closed 4 years ago

zdavatz commented 4 years ago

If the python scraper does not find a value for deaths please output an int not a string. Thank you @baryluk

AI 2020-03-26T18:00 11 - OK

better would be

AI 2020-03-26T18:00 11 0 OK

baryluk commented 4 years ago

Not really. - means we don't have data. 0 means we have data and it is 0. There is a significant difference. When '-', the actual deaths could be anything, we just don't know directly.

zdavatz commented 4 years ago

Ok, but then you are mixing string and integer in that column. So the column is not consistent. Would be great if we could stick to the CSV standard and not mix strings with integers in a column.

baryluk commented 4 years ago

Yes, it is consistent. It is either number or -. This is intentional. You can treat both as a string if you want. Or handle the two cases explicitly.

zdavatz commented 4 years ago

Can you just leave the space empty then? Would be better then putting a - for parsing.

zdavatz commented 4 years ago

What conventions are you using for the output file?

baryluk commented 4 years ago

No, I can't because then it is actually harder to parse. It will also break add_db_entry.py.

zdavatz commented 4 years ago

Can we then at least stick to a standard CSV delimiter for the values?

baryluk commented 4 years ago

@zdavatz How CSV will help? You will still need to handle int or empty field explicitly.

zdavatz commented 4 years ago

How CSV will help? You will still need to handle int or empty field explicitly.

Yes, but we do not have to handle the string. Empty is easier to handle then string (-).

baryluk commented 4 years ago

Parsing empty as int, still will fail, so it requires same amount of handling.

Could you show me snippet of code showing how you do it now, and how would you do it if it is empty?

zdavatz commented 4 years ago

https://github.com/zdavatz/covid19_ch/blob/master/python-scripts/digest_baryluk.py

baryluk commented 4 years ago

Oh. I see, you are trying to read it as csv. Well, yes, that will not really work. The output is not csv.

We should just make a secondary output that is more csv-like. Or better yet, use csv files that are in this repo.

zdavatz commented 4 years ago

well, we just want to parse the up-to-date numbers as quickly as possible. Our Map is updated every hour: http://covid19.ddrobotec.com/

baryluk commented 4 years ago

I can start publishing csv files at https://www.functor.xyz/covid_19/scrapers/outputs/latest.csv which is the same data as latest.txt just in csv format.

If that works, I can have it working in an hour or two.

zdavatz commented 4 years ago

That would be great, sir! Will make our data analysts very happy ;).

baryluk commented 4 years ago

@zdavatz Could you take a look at https://www.functor.xyz/covid_19/scrapers/outputs/latest.csv ? It provides same info as latest.txt, without failures.

Once we got https://github.com/openZH/covid_19/pull/275 merged, it will also provide extra information.

baryluk commented 4 years ago

@zdavatz I made the data on functor.xyz in https://www.functor.xyz/covid_19/scrapers/outputs/latest.csv working, and also provide few extras (hospitalized, ventilated, recovered/released) for some cantons (AG, VD, UR, ZG, soon ZH, TI, GR and JU).

$ curl 'https://www.functor.xyz/covid_19/scrapers/outputs/latest.csv'
date,time,abbreviation_canton_and_fl,ncumul_tested,ncumul_conf,ncumul_hosp,ncumul_ICU,ncumul_vent,ncumul_released,ncumul_deceased,source
2020-03-26,,SG,,306,,,,,,"Scraper for SG at 2020-03-28T00:31:38+01:00 using https://www.sg.ch/tools/informationen-coronavirus.html"
2020-03-26,,VD,,2532,,,,148,38,"Scraper for VD at 2020-03-28T00:31:59+01:00 using https://api.datawrapper.de/v3/charts/tr5bJ/data"
2020-03-27,16:00,AG,,364,,,9,,3,"Scraper for AG at 2020-03-28T00:30:26+01:00 using https://www.ag.ch/de/themen_1/coronavirus_2/alle_ereignisse/alle_ereignisse_1.jsp"
2020-03-27,18:00,AI,,12,,,,,,"Scraper for AI at 2020-03-28T00:30:29+01:00 using https://www.ai.ch/themen/gesundheit-alter-und-soziales/gesundheitsfoerderung-und-praevention/uebertragbare-krankheiten/coronavirus"
2020-03-27,13:00,AR,,44,,,,,2,"Scraper for AR at 2020-03-28T00:30:31+01:00 using https://www.ar.ch/verwaltung/departement-gesundheit-und-soziales/amt-fuer-gesundheit/informationsseite-coronavirus/"
2020-03-27,,BE,,718,,,,,8,"Scraper for BE at 2020-03-28T00:30:34+01:00 using https://www.besondere-lage.sites.be.ch/besondere-lage_sites/de/index/corona/index.html"
2020-03-27,,BL,,466,,,,,5,"Scraper for BL at 2020-03-28T00:30:35+01:00 using https://www.statistik.bl.ch/files/sites/Grafiken/COVID19/Grafik_COVID19_BL_Linie.htm"
2020-03-27,10:00,BS,,534,,,,,,"Scraper for BS at 2020-03-28T00:30:40+01:00 using https://www.gd.bs.ch/, https://www.gd.bs.ch//nm/2020-tagesbulletin-coronavirus-534-bestaetigte-faelle-im-kanton-basel-stadt-gd.html"
2020-03-27,,FR,,369,,,,,15,"Scraper for FR at 2020-03-28T00:30:44+01:00 using https://www.fr.ch/covid19/sante/covid-19/coronavirus-statistiques-evolution-de-la-situation-dans-le-canton"
2020-03-27,12:00,GE,,1924,,,,,23,"Scraper for GE at 2020-03-28T00:31:01+01:00 using https://www.ge.ch/document/point-coronavirus-maladie-covid-19/telecharger"
2020-03-27,13:30,GL,,44,,,,,,"Scraper for GL at 2020-03-28T00:31:08+01:00 using https://www.gl.ch/verwaltung/finanzen-und-gesundheit/gesundheit/coronavirus.html/4817"
2020-03-27,,GR,,409,,,,,9,"Scraper for GR at 2020-03-28T00:31:16+01:00 using https://www.gr.ch/DE/institutionen/verwaltung/djsg/ga/coronavirus/info/Seiten/Start.aspx"
2020-03-27,16:00,JU,,112,,,,,,"Scraper for JU at 2020-03-28T00:31:20+01:00 using https://www.jura.ch/fr/Autorites/Coronavirus/Accueil/Coronavirus-Informations-officielles-a-la-population-jurassienne.html"
2020-03-27,11:00,LU,,287,,,,,3,"Scraper for LU at 2020-03-28T00:31:23+01:00 using https://gesundheit.lu.ch/themen/Humanmedizin/Infektionskrankheiten/Coronavirus"
2020-03-27,14:00,NE,,287,,,,,5,"Scraper for NE at 2020-03-28T00:31:28+01:00 using https://www.ne.ch/autorites/DFS/SCSP/medecin-cantonal/maladies-vaccinations/Pages/Coronavirus.aspx"
2020-03-27,15:15,NW,,54,,,,,0,"Scraper for NW at 2020-03-28T00:31:31+01:00 using https://www.nw.ch/gesundheitsamtdienste/6044"
2020-03-27,,OW,,37,,,,,,"Scraper for OW at 2020-03-28T00:31:35+01:00 using https://www.ow.ch/de/verwaltung/dienstleistungen/?dienst_id=5962"
2020-03-27,07:30,SH,,36,,,,,,"Scraper for SH at 2020-03-28T00:31:43+01:00 using https://sh.ch/CMS/content.jsp?contentid=3209198&language=DE&_=1584807070095"
2020-03-27,00:00,SO,,157,,,,,1,"Scraper for SO at 2020-03-28T00:31:45+01:00 using https://corona.so.ch/"
2020-03-27,,SZ,,119,,,,32,1,"Scraper for SZ at 2020-03-28T00:31:48+01:00 using https://www.sz.ch/behoerden/information-medien/medienmitteilungen/coronavirus.html/72-416-412-1379-6948"
2020-03-27,,TG,,117,,,,,,"Scraper for TG at 2020-03-28T00:31:52+01:00 using https://www.tg.ch/news/fachdossier-coronavirus.html/10552"
2020-03-27,08:00,TI,,1688,,,,,76,"Scraper for TI at 2020-03-28T00:31:54+01:00 using https://www4.ti.ch/dss/dsp/covid19/home/"
2020-03-27,12:00,UR,,40,,,,3,0,"Scraper for UR at 2020-03-28T00:31:58+01:00 using https://www.ur.ch/themen/2962"
2020-03-27,,VS,,808,,,,,20,"Scraper for VS at 2020-03-28T00:32:06+01:00 using https://www.vs.ch/de/web/coronavirus"
2020-03-27,18:00,ZG,,101,,,,18,1,"Scraper for ZG at 2020-03-28T00:32:14+01:00 using https://www.zg.ch/behoerden/gesundheitsdirektion/amt-fuer-gesundheit/corona"
2020-03-27,09:30,ZH,,1578,,,,,11,"Scraper for ZH at 2020-03-28T00:32:17+01:00 using https://gd.zh.ch/internet/gesundheitsdirektion/de/themen/coronavirus.html"
zdavatz commented 4 years ago

great, thank you @baryluk 💪🏻👏🤣🥇🚨‼️

zdavatz commented 4 years ago

@baryluk what is the option to create the CSV file output, as per your link above?