lubianat / wikidata_covid19

A repository for activities related to the Wikidata Wikiproject COVID-19
MIT License
2 stars 0 forks source link

Worldwide Data Automation #2

Open lubianat opened 4 years ago

lubianat commented 4 years ago

Hello @jvfe,

Nice work with the automation with the datahub.io data!

I believe we can use as reference:

Reference URL : https://datahub.io/core/covid-19 File name in archive (P7793) : r/countries-aggregated.csv Retrieve in : <today's date> (This is the download URL https://datahub.io/core/covid-19/r/countries-aggregated.csv)

Also, nice that you blacklisted pages that are being manually updated!

What we can do is check which of the items in the list had an update in the past 20 hours and only update those that didn't.

This in addition to the ones you detected as manually curated.

What do you think?

I'll make those changes to the code now, but we can discuss further.

jvfe commented 4 years ago

Sure, I think the change is great! I actually totally forgot about this script until about an hour ago and I just updated the items (with slightly different references) before I saw your comment.

But the changes are welcome, I think it's not hard to implement. Thanks!

lubianat commented 4 years ago

Nice,

I have implemented the code to do the screening and get the nonupdated items.

I think your update did not work for all of the items, see the Angola page, for exemple.

If you could merge your changes to the files I pushed here, it would be awesome, so we do not have editing conflicts.

Cool, it is working!

jvfe commented 4 years ago

Oh yeah it didn't, don't know why. I actually didn't change anything in the code from last time. Just ran it and updated, so there's nothing to merge, thanks anyway for asking.

I believe the main thing for automation both in this script and Brazil data would be integrating with the QS api or registering a bot for it (still don't get how that works in wikidata) but I couldn't get the API to work in any way, shape or form. It keeps saying I don't have permission due to not having used QS in batch mode (I must've used it more than ten times).

Anyway thanks for the update!

lubianat commented 4 years ago

I get exactly the same problem whenever I try to use the QS API! I think that is some bug on their side.

So, registering for a bot is basically just going to this page https://www.wikidata.org/wiki/Wikidata:Bot_requests and following the instructions there, but this does not have to limit you. I mean, for Brazil at least the edits are not that many.

I am working in parallel with the CellosaurusBot (for cell line info) : https://github.com/lubianat/cellosaurus-wikidata-bot.

I could run it from my personal, non-bot account. It uses WikidataIntegrator https://github.com/SuLab/WikidataIntegrator to run updates to Wikidata via python, I believe it may be useful.

On Wed, Apr 29, 2020 at 5:29 PM João Vitor notifications@github.com wrote:

Oh yeah it didn't, don't know why. I actually didn't change anything in the code from last time. Just ran it and updated, so there's nothing to merge, thanks anyway for asking.

I believe the main thing for automation both in this script and Brazil data would be integrating with the QS api or registering a bot for it (still don't get how that works in wikidata) but I couldn't get the API to work in any way, shape or form. It keeps saying I don't have permission due to not having used QS in batch mode (I must've used it more than ten times).

Anyway thanks for the update!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lubianat/wikidata_covid19/issues/2#issuecomment-621445758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NC76NJUKKK4KW6QZIEYDRPCE33ANCNFSM4MT7WPCA .

lubianat commented 4 years ago

Hey @jvfe,

I will make a Bot Request for the worldwide data, then!

It will run every day for country items that were not updated in the past 23 hours and that is not on the items you curated as "manually updated".

The point is: I have never done a bot request before.

I will keep you updated here of whatever happens there! There would be a bot account that I was thinking of running automatically from a server (my pc, most likely), but we could divide the account too. Actually, that would be super welcome.

Best, Tiago

jvfe commented 4 years ago

Sure, I just updated the list of countries I saw being manually updated as the previous list was from more than a week ago.

I think we can split the account too, it's your choice, to me it's all good!