Open mlandauer opened 3 years ago
I'm happy to write a scraper.py for option 2. I'd prefer this as then I don't need dozens of different repositories to maintain.
Please let me know what else I can do to help.
Awesome @43South. For the authority_label just use the name of the authority that you use in the filename for the scraper. For example brighton
.
Let me know (on this issue) when you've written and tested out your scraper.py. It would be great if you could load it onto morph.io and test it out there as well. Also let me know if you run into any problems.
Thanks!
I've written a scraper.py that calls all the others and it (mostly) works. There's at least one bug, in glamorganspringbay.py. I'm putting onto morph.io now to test it in the wild.
So ... I've run into a problem. I'm running Python 3.8, but morph.io is insisting on running 2.7.6. I tried adding #!/usr/bin/env python3 to the start of my scraper.py to see if that would cheer it up, but no :(
Ah runtime.txt. Now I need to downgrade from Python 3.8.11 to 3.6.2 (and associated code changes ...)
Okay, it's running to some extent in morph.io. That'll do my for tonight.
It's running in morph.io and apparently writing a database.
@mlandauer, as I'm adding authority_label, should I make both authority_label and council_reference unique_keys when adding records?
@43South authority_label
and council_reference
should be used as a composite key when adding records. I think that's what you mean right? See https://github.com/planningalerts-scrapers/multiple_horizon/blob/master/scraper.rb#L17 for an example of how it's done in one of the other "multiple" scrapers. Sorry that it's in Ruby. That way the same council_reference can be used by different authorities without crashing into each other.
Hi is there any update on the Tasmanian Council Feeds, looks like some of them have stopped working. I would be happy to help where I can to get these all updated.
G'day Benjamin,
I was working on them ages ago then got distracted by other things. Happy for you to take it over it I might have time to get back into it.
Cheers Richard
From: Benjamin Harris @.> Sent: Thursday, March 21, 2024 12:13:19 PM To: planningalerts-scrapers/issues @.> Cc: 43South @.>; Mention @.> Subject: Re: [planningalerts-scrapers/issues] Integrate new Tasmanian scrapers from 43South (#444)
Hi is there any update on the Tasmanian Council Feeds, looks like some of them have stopped working. I would be happy to help where I can to get these all updated.
— Reply to this email directly, view it on GitHubhttps://github.com/planningalerts-scrapers/issues/issues/444#issuecomment-2011008643, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASLAXJK7GWH2BABX4SLMB3YZIX27AVCNFSM5E4AY7TKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGEYDAOBWGQZQ. You are receiving this because you were mentioned.Message ID: @.***>
Ok Thanks Richard, Ill have to do some reading up on how it works and getting them working locally first. Obviously you cant just run the scripts natively in python...
https://drive.google.com/file/d/1RbfYb6ldbaUdcEQNtqDET3gKBkXcxxRM/view This shows my previous work. As you see I’ve rich experience in Data Scraping using PHP, Node.js, Python libraries, so there’s no problem in scraping data. During develop web sites, I have scraped data from wide range of sites using Beautiful Soup, Selenium, Scrapy and so on. So I want to help you as soon as possible.
@43South has written a whole lot of new scrapers for Tasmania at https://github.com/43South/tasmaniada. It would be great to get these added into PlanningAlerts as soon as possible.
There are two options for how these scrapers could be added to PlanningAlerts:
scraper.py
. It's the way until relatively recently all scrapers for PlanningAlerts were done. Advantage is that it's simple - it's a big copy and paste job. Disadvantage is that it is harder to maintain as we suddenly have all these extra repos and scrapers to manage.scraper.py
which calls each scraper in turn. In addition each row in the database needs an extra fieldauthority_label
with the name of the council/authority. See for example https://morph.io/planningalerts-scrapers/multiple_technology_one. Advantage is that we keep everything together in one place and it's easier to maintain. Disadvantage is it does require a bit more work to munge everything together with a main script.