lanl / WHO-FLUMART-scraper

Python code for scraping the WHO's FLUMART data.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Include latest data in repository? #1

Closed tomwenseleers closed 1 year ago

tomwenseleers commented 2 years ago

Thanks for the nice scraper. Was just wondering if you would mind putting a copy of the data in csv in your repository? I'm just asking in the context of an analysis of worldwide excess mortality I am doing and I would like to include influenza activity if possible, and it would be handy if right now I could just point to a csv on github in my R script (it would not be an issue for me right now if it was not updated daily or monthly or whatever, just that I have a somewhat recent version available)...

cheers & many thanks in advance!

PS Saw another scraper at https://github.com/MagnusBook/flunet-scraper that includes a csv, but unfortunately that one hasn't been updated for 3 years either

gfairchild commented 2 years ago

Thanks for reaching out, @tomwenseleers! I'd prefer not to put any data in the repo for a few reasons. First, I don't want to mislead people into thinking that I'd be doing that regularly; I'm worried that it might encourage people to routinely ask for updates. Second, while there's nothing technically stopping anyone from storing data in a Git repo, Git really isn't designed to store large data files (well, there technically is LFS, but I'd prefer not to use it unless there's a very strong justification). A full CSV dump will be rather large (totally ball-parking here, but I'm guessing something on the order of 40-50 MB).

Now, all that said, I'd normally be more than happy to just generate a single dump and send it directly to you since you asked so nicely, but the FLUMART website has actually been down for the last few weeks, and I'm not sure when it'll be fixed. It loads initially, but if you fill out the form and hit submit, you'll be greeted with this lovely error message:

Screenshot 2022-08-18 at 16-57-30 Runtime Error

Until the website is fixed, unfortunately no one will be able to collect data. I hope the WHO is able to fix this soon! FWIW, our team reached out to the FLUMART team a couple weeks ago (according to https://www.who.int/tools/flumart, it looks like flumart@who.int is the right email address), but we haven't received a response. You might consider pinging the FLUMART folks as well.

tomwenseleers commented 2 years ago

Many thanks for getting back to me! I reached out to the FLUMART folks to ask the complete data, as I also got this server error when I tried that page, let's see what they say. And when I tried their PowerBI dashboard https://app.powerbi.com/view?r=eyJrIjoiNjViM2Y4NjktMjJmMC00Y2NjLWFmOWQtODQ0NjZkNWM1YzNmIiwidCI6ImY2MTBjMGI3LWJkMjQtNGIzOS04MTBiLTNkYzI4MGFmYjU5MCIsImMiOjh9 I got the same Excel back with only a part of all the data, and the data is also the same no matter which time range I choose (this is what I get back as an Excel - a file with just 2022 records: https://www.dropbox.com/s/oprb6fqinad5jv5/flunet_powerbi.xlsx?dl=1). So that also didn't help... Shame that the WHO could not just provide some Github repo or ftp address with the complete database... Would be so much handier than scientists having to go to the lengths of having to write some web scraper... (Similar problem also with GISAID for Covid, for which people are writing scrapers now, https://github.com/Wytamma/GISAIDR, https://stackoverflow.com/questions/72632118/download-covid-patient-metadata-from-gisaid-website-in-r-using-rselenium)

gfairchild commented 2 years ago

Great, thanks for reaching out to them. Let me know if you hear a response, and if I hear a response, I'll let you know here as well.

Thanks for sending the link to the PowerBI dashboard. I actually wasn't aware of that! Out of curiosity, how did you find that?

tomwenseleers commented 2 years ago

The PowerBI dashboard I found just by following the link given at https://www.who.int/tools/flunet under Download data. The FluMart link I wasn't able to find actually from their website, https://www.who.int/tools/flumart... So not sure how one is even supposed to find https://apps.who.int/flumart/Default?ReportNo=12.

gfairchild commented 2 years ago

Nice, thanks for that!

I'm wondering if they're actually working to migrate away from the FLUMART website in favor of PowerBI, and the fact that the FLUMART website is failing might explain that it's just being neglected. FWIW, I'm seeing the same export behavior with that PowerBI interface, so that also needs to be fixed.

tomwenseleers commented 2 years ago

Might be something like that yes. Still no reply from them. PowerBI I personally really don't like - and even harder to scrape (well with RSelenium I guess it could always be done, but not at all practical, and it would require that their export at least was working as intended). I would be so happy if they could just make their latest data available through github or ftp. I don't think any bioinformatician / analyst cares about all those dashboards...

tomwenseleers commented 1 year ago

Ha finally received a reply from the WHO and the FluNet/FluMart & fluID data can be downloaded as a single CSV file from https://www.who.int/teams/global-influenza-programme/surveillance-and-monitoring/influenza-surveillance-outputs https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FNT?$format=csv https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FID_EPI?$format=csv data dictionary here https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FLU_METADATA?$format=csv

gfairchild commented 1 year ago

Ha finally received a reply from the WHO and the FluNet/FluMart & fluID data can be downloaded as a single CSV file from https://www.who.int/teams/global-influenza-programme/surveillance-and-monitoring/influenza-surveillance-outputs https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FNT?$format=csv https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FID_EPI?$format=csv data dictionary here https://frontdoor-l4uikgap6gz3m.azurefd.net/FLUMART/VIW_FLU_METADATA?$format=csv

Thank you for providing this information! I never ended up receiving a response. Did they happen to tell you if the FLUMART site is being decommissioned?

tomwenseleers commented 1 year ago

They didn't tell me that, but I presume so, yes... Anyway, if they now also provide everything in a single CSV file there should be no need anymore to scrape the info from that older Flumart site...