openaustralia / ukraine_verkhovna_rada_votes

Votes by deputies in the Ukrainian Parliament
https://morph.io/openaustralia/ukraine_verkhovna_rada_votes
1 stars 0 forks source link

Runtime Scrape #14

Open beastie87 opened 8 years ago

beastie87 commented 8 years ago

That Scrape working on the cron every day, we have a difference between the time server morph.io and work this sсreyper when even go half day of voting Can we change the start time scraper? Thank you!

henare commented 8 years ago

morph.io doesn't allow us to change the start time of the scraper. It only runs it once a day, at some random time.

If I understand it correctly the problem is that we're sometimes scraping half a day because the scrape begins in the middle of the current day's parliamentary proceedings.

Trying to synchronise our scrapes with when the data is posted to the Rada site seems like it could have lots of problems. What we could do instead is not scrape the current day so that it never gets a half day's data. Then when the scraper runs the next day it will pick up a full day of data. What do you think?

beastie87 commented 8 years ago

then I think I just need to run manually the scraper. because the data is needed on the same day @lisoffsky what do you say?

henare commented 8 years ago

because the data is needed on the same day

I'd definitely recommend not trying to have the site be so up to date. The reason is that once you start doing that then it's always expected of the site and it becomes hard to manage because you're reliant on the Rada site anyway.

I'd strongly suggest that you set the expectation that the site's data is delayed by a day. That's what we have in Australia - the data appears about 11:00 the following day. This gives us enough time to load the data and sort out any problems but also is quick enough for people to link to the site for any timely news stories or discussion.

It sets the expectation that the site is for analysis, not realtime data.

lisoffsky commented 8 years ago

That's a good point Henare, thank you. The only issue for us to resolve with your advice is next one. I've seen on morph.io that scraper for yesterday's voting didn't get divisions. Here is a result of scraper work "Injecting configuration and compiling... Injecting scraper and running... Checking for votes on: 2016-01-28 Fetching plenary day: http://w1.c1.rada.gov.ua/pls/radan_gs09/ns_el_h2?data=28012016&nom_s=3 Found 0 vote events to scrape... All done."

How do you think, maybe it started too early by Ukrainian time so nothing was on website for that moment? On your opinion what should we do to improve this?

Thank you for your great mind and time, Henare.

henare commented 8 years ago

How do you think, maybe it started too early by Ukrainian time so nothing was on website for that moment?

Yes, I think you're probably right.

On your opinion what should we do to improve this?

At the moment the scraper gets all votes until the current day. I think this might be fixed if we only do until the day before so there's enough time for the votes to be posted on the Rada site.

We might need to change it again but that's probably the best first step.

Thank you for your great mind and time, Henare.

It's a pleasure!