mikelmaron / planningalerts

Automatically exported from code.google.com/p/planningalerts
0 stars 0 forks source link

reigate scraper #15

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Please find attached a ruby script which will take in a Reigate and
Banstead Borough Council weekly PDF

http://www.reigate-banstead.gov.uk/public/Business_Planning/Planning/Apps/planni
ng_applications.asp

and spit out the applications (_not_ the decisions and appeals) in that
xml format you sort of support. The url/comment_urls are kind of
pointless (on this particular councils website) as you need to agree to
their T&Cs before you can access them, but if you've done that in the
current cookie session then they should work.

It needs a script to find the pdf in the first place, run daily/weekly
but I'm hoping one of you has the 1337 p3r1 5ki115 to magic that. I have
no fricking clue what they add on to the end of the filename on the pdf
(looks like a random number, its not the date or file size, maybe the
bastards do it on purpose?)

Original issue reported on code.google.com by mikel.ma...@gmail.com on 29 Mar 2007 at 7:42

Attachments:

GoogleCodeExporter commented 9 years ago
oops I forgot to mention you dump the pdf through pdftohtml and then the
larger html piped through the script.

Original comment by mikel.ma...@gmail.com on 29 Mar 2007 at 7:48