os-data / registry

A tracker for data to load into OpenSpending
http://openspending.org/
7 stars 4 forks source link

[Czech republic] Government spending #9

Open zufanka opened 11 years ago

zufanka commented 11 years ago

The czech government spending data on the website: http://wwwinfo.mfcr.cz/cgi-bin/ufis/iufisorg/index.pl require advanced scraping.

I can understand the language, clean the data etc. but I would need help with writing the scraper. Also not everything on the website is relevant to government spending.

zufanka commented 11 years ago

I have sent an email to Transparency International Czech to see whether they could help me about a week ago, but I got no answer. I am going to try to approach the owners directly.

zufanka commented 11 years ago

Answer from Transparency International Czech:

data set of governmental budget expenses in the years 2004 - 2012: http://budovanistatu.cz/bubble#/v1/Rozpo%C4%8Det | https://www.google.com/fusiontables/DataSource?docid=10XncRz-v2IKdgy6gwJ8AmlQJ4T7dAzyaubhDWxY#rows:id=1

more budget data to download: http://wwwinfo.mfcr.cz/psp/ (to be checked)

If needed, template for information request: http://www.transparency.cz/pristup-k-informacim-vzory/

I don't know whether the data set is the same as the original link, I am however going to clean the data set and feed it to openspending, as it is not there yet.

anderspeders commented 11 years ago

Hi @zufanka Great work digging out this data! If you need help getting the data cleaned or uploaded to OpenSpending, let us know?

zufanka commented 11 years ago

Hi @anderspeders ! I have tried to upload the data several times, but I never succeeded http://openspending.org/czech-budget-2004-2012

zufanka commented 11 years ago

Hi @anderspeders ! I have tried to upload the dataset a few times, but I think I did not succeed. Do any of these work for you? http://openspending.org/czech-government-budget-expenses-2004-2012 , http://openspending.org/czech-budget-2004-2012

anderspeders commented 11 years ago

Hi @zufanka, Great work. I added a some dimensions and tried loading. However, the dataset seems to be missing a unique ID for each row.

See the error report here: http://openspending.org/czech-budget-2004-2012/sources/3488/runs/7418

Would it be possible for you to go back to the dataset and add a unique ID (eg. a number) to each row in the dataset?