Closed elynch303 closed 1 year ago
I would be happy to merge a pull request. The relevant file is in another repository: https://github.com/opencivicdata/scrapers-ca/blob/master/ca_ns/people.py
i tried to add this in a new branch so i could make an MR and it will not let me push im getting a permissions error
ERROR: Permission to opencivicdata/scrapers-ca.git denied to elynch303.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
the new branch. but this was the update for people.py in the ca_ns dir
import re
from utils import CanadianPerson as Person
from utils import CanadianScraper
COUNCIL_PAGE = "https://nslegislature.ca/members/profiles"
class NovaScotiaPersonScraper(CanadianScraper):
PARTIES = {
"Liberal": "Nova Scotia Liberal Party",
"PC": "Progressive Conservative Association of Nova Scotia",
"NDP": "Nova Scotia New Democratic Party",
"Independent": "Independent",
}
def scrape(self):
page = self.lxmlize(COUNCIL_PAGE)
members = page.xpath(
'//div[contains(@class, "view-display-id-page_mlas_current_tiles")]//div[contains(@class, "views-row-")]'
) # noqa
assert len(members), "No members found"
for member in members:
district = member.xpath('.//div[contains(@class, "views-field-field-constituency")]/div/text()')[0]
party = member.xpath('.//span[contains(@class, "party-name")]/text()')[0]
if party == "Vacant":
continue
detail_url = member.xpath(".//@href")[0]
detail = self.lxmlize(detail_url)
name = detail.xpath('//div[contains(@class, "views-field-field-last-name")]/div/h1/text()')[0]
name = re.sub(r"(Honourable |\(MLA Elect\)|\(New MLA Elect\))", "", name)
party = self.PARTIES[party.replace("LIberal", "Liberal")]
p = Person(primary_org="legislature", name=name, district=district, role="MLA", party=party)
p.image = detail.xpath('//div[contains(@class, "field-content")]//img[@typeof="foaf:Image"]/@src')[0]
contact = detail.xpath('//div[contains(@class, "mla-current-profile-contact")]')[0]
address = contact.xpath("./p[2]")[0]
address = address.text_content().strip().splitlines()
address = list(map(str.strip, address))
p.add_contact("address", "\n".join(address), "constituency")
email = self.get_email(contact, error=False)
if email:
p.add_contact("email", email)
p.add_contact("voice", self.get_phone(contact, area_codes=[902]), "constituency")
p.add_source(COUNCIL_PAGE)
p.add_source(detail_url)
yield p
PS i also was not able to test this when i run pupa update ca_ns
i keep getting
exception "cannot import name 'Mapping' from 'collections' (/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py)" prevented loading of pupa.cli.commands.update module
usage: pupa [-h] [--debug] [--loglevel LOGLEVEL] {init,dbinit} ...
pupa: error: argument subcommand: invalid choice: 'update' (choose from 'init', 'dbinit')
PS would want to update the other MPP /MLA (provincial government members address as well) so if the permissions to the repo are fixed i could add this as one larger MR
The way it works on GitHub: You need to make a fork, push to your fork, and then make the pull request.
it looks like the site scraper is grabbing the address for some representatives and not others. for example take the endpoint below https://represent.opennorth.ca/postcodes/B0A1G0/?format=apibrowser in here you will see John white his office list did mange to pull the phone number form the url end point being https://nslegislature.ca/members/profiles/john-white but it failed to get the mailing address of the office witch is available form the same source.
where as others like Amanda Mcdougall dose have her office mailing address available