planningalerts-scrapers / issues

Only for keeping track of all issues related to scraping
0 stars 0 forks source link

Integrate new Tasmanian scrapers from 43South #444

Open mlandauer opened 3 years ago

mlandauer commented 3 years ago

@43South has written a whole lot of new scrapers for Tasmania at https://github.com/43South/tasmaniada. It would be great to get these added into PlanningAlerts as soon as possible.

There are two options for how these scrapers could be added to PlanningAlerts:

  1. Each scraper is run in its own repo with it's own morph.io scraper. See for example https://github.com/planningalerts-scrapers/city_of_sydney. Each scraper is renamed to scraper.py. It's the way until relatively recently all scrapers for PlanningAlerts were done. Advantage is that it's simple - it's a big copy and paste job. Disadvantage is that it is harder to maintain as we suddenly have all these extra repos and scrapers to manage.
  2. We keep a single repository for all the scrapers but combine with a main script scraper.py which calls each scraper in turn. In addition each row in the database needs an extra field authority_label with the name of the council/authority. See for example https://morph.io/planningalerts-scrapers/multiple_technology_one. Advantage is that we keep everything together in one place and it's easier to maintain. Disadvantage is it does require a bit more work to munge everything together with a main script.
43South commented 3 years ago

I'm happy to write a scraper.py for option 2. I'd prefer this as then I don't need dozens of different repositories to maintain.

Please let me know what else I can do to help.

mlandauer commented 3 years ago

Awesome @43South. For the authority_label just use the name of the authority that you use in the filename for the scraper. For example brighton.

Let me know (on this issue) when you've written and tested out your scraper.py. It would be great if you could load it onto morph.io and test it out there as well. Also let me know if you run into any problems.

Thanks!

43South commented 3 years ago

I've written a scraper.py that calls all the others and it (mostly) works. There's at least one bug, in glamorganspringbay.py. I'm putting onto morph.io now to test it in the wild.

43South commented 3 years ago

So ... I've run into a problem. I'm running Python 3.8, but morph.io is insisting on running 2.7.6. I tried adding #!/usr/bin/env python3 to the start of my scraper.py to see if that would cheer it up, but no :(

43South commented 3 years ago

Ah runtime.txt. Now I need to downgrade from Python 3.8.11 to 3.6.2 (and associated code changes ...)

43South commented 3 years ago

Okay, it's running to some extent in morph.io. That'll do my for tonight.

43South commented 3 years ago

It's running in morph.io and apparently writing a database.

43South commented 3 years ago

@mlandauer, as I'm adding authority_label, should I make both authority_label and council_reference unique_keys when adding records?

mlandauer commented 3 years ago

@43South authority_label and council_reference should be used as a composite key when adding records. I think that's what you mean right? See https://github.com/planningalerts-scrapers/multiple_horizon/blob/master/scraper.rb#L17 for an example of how it's done in one of the other "multiple" scrapers. Sorry that it's in Ruby. That way the same council_reference can be used by different authorities without crashing into each other.

Bigben83 commented 6 months ago

Hi is there any update on the Tasmanian Council Feeds, looks like some of them have stopped working. I would be happy to help where I can to get these all updated.

43South commented 6 months ago

G'day Benjamin,

I was working on them ages ago then got distracted by other things. Happy for you to take it over it I might have time to get back into it.

Cheers Richard


From: Benjamin Harris @.> Sent: Thursday, March 21, 2024 12:13:19 PM To: planningalerts-scrapers/issues @.> Cc: 43South @.>; Mention @.> Subject: Re: [planningalerts-scrapers/issues] Integrate new Tasmanian scrapers from 43South (#444)

Hi is there any update on the Tasmanian Council Feeds, looks like some of them have stopped working. I would be happy to help where I can to get these all updated.

— Reply to this email directly, view it on GitHubhttps://github.com/planningalerts-scrapers/issues/issues/444#issuecomment-2011008643, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASLAXJK7GWH2BABX4SLMB3YZIX27AVCNFSM5E4AY7TKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGEYDAOBWGQZQ. You are receiving this because you were mentioned.Message ID: @.***>

Bigben83 commented 6 months ago

Ok Thanks Richard, Ill have to do some reading up on how it works and getting them working locally first. Obviously you cant just run the scripts natively in python...

junglerot commented 6 months ago

https://drive.google.com/file/d/1RbfYb6ldbaUdcEQNtqDET3gKBkXcxxRM/view This shows my previous work. As you see I’ve rich experience in Data Scraping using PHP, Node.js, Python libraries, so there’s no problem in scraping data. During develop web sites, I have scraped data from wide range of sites using Beautiful Soup, Selenium, Scrapy and so on. So I want to help you as soon as possible.