okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**
https://serenata.ai/en
MIT License
4.52k stars 662 forks source link

Compare expenses made with lodging against official prices of rooms #26

Open Irio opened 8 years ago

Irio commented 8 years ago

Filtering quota's dataset by records with value 'Lodging, except for congressperson from Distrito Federal' in the column subquota_description will return many expenses made with hotels. We could match the value in the receipt against publicly available (through Booking.com, for instance) range of prices.

JVUnderground commented 8 years ago

Holidays and events should be considered as they usually significantly the price of rooms.

samuelgrigolato commented 7 years ago

Does anybody have an idea how to proceed with this scraping? I mean, in addition to what is already being done by @Lrcezimbra at #100.

I had a look into booking.com but couldn't find any suitable API. I also tried decolar.com (they do have a public and free API [1]), but their terms of usage doesn't seem to allow the kind of data scraping we need (I don't even know why I thought it would :smile:).

[1] http://dev.despegar.com/howto/hotels

ebonet-zz commented 7 years ago

I don't believe there are historical databases for pricing. What could be done is to identify hotels on the database and start watching booking/expedia/... and scrape data, building Serenata's own dataset for that. Keep in mind that hotel pricing is somewhat complex, and database can become large.

evilasiov commented 7 years ago

From 2012 to now, housing pricing almost not changed.

On Sat, Jan 7, 2017 at 1:57 PM, Eduardo Bonet notifications@github.com wrote:

I don't believe there are historical databases for pricing. What could be done is to identify hotels on the database and start watching booking/expedia/... and scrape data, building Serenata's own dataset for that. Keep in mind that hotel pricing is somewhat complex, and database can become large.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datasciencebr/serenata-de-amor/issues/26#issuecomment-271091583, or mute the thread https://github.com/notifications/unsubscribe-auth/AXulHVS84MnnKaZp_fvxYV4DrfOCWei0ks5rP7XSgaJpZM4JvAqG .

cuducos commented 6 years ago

Closed accidentally by unrelated commit from Rosie/Jarbas repos.