robbrad / UKBinCollectionData

UK Council Bin Collection Data Parser Outputting Bin Data as a JSON
MIT License
152 stars 94 forks source link

Gedling Borough Council #448

Closed roberthunt closed 10 months ago

roberthunt commented 11 months ago

Name of Council

Gedling Borough Council

Example Address/Postcode

Valeside Gardens

Additional Information

This one may be quite challenging, some facts:

  1. Only data available is static calendar PDFs covering a 12-month period.
  2. Scraping text from the PDFs is not helpful as it relies on cell background colours to identify days.
  3. Lookup only accepts a street name or partial street name (no postcodes).
  4. There is a reminder service but this is by email only and around 12 hours before collection.
  5. There seem to be a fixed number of calendars. For normal collections, 4 for every weekday, for garden waste 2 for every weekday.
  6. When a collection falls on a bank holiday it seems to shift to the previous Saturday.

Ideas

  1. We can resolve a street name to a specific calendar using the search.
  2. We should be able to predict the days based on the calendar.
  3. We need to know UK bank holidays to figure out shifted collections.

2022 / 2023

https://apps.gedling.gov.uk/refuse/search.aspx

Household (Black Bin) / Glass (Green Box) / Recycling (Green Bin)

At a glance, the G number seems to correlate to the week that the glass collection occurs (glass + recycling), starting in the first month (December). So WednesdayG3 would have glass collection 3rd week Dec 2022.

Collection pattern is [Household -> Recycling -> Household -> Recycling/Glass]

MondayG1.pdf MondayG2.pdf MondayG3.pdf MondayG4.pdf TuesdayG1.pdf TuesdayG2.pdf TuesdayG3.pdf TuesdayG4.pdf WednesdayG1.pdf WednesdayG2.pdf WednesdayG3.pdf WednesdayG4.pdf ThursdayG1.pdf ThursdayG2.pdf ThursdayG3.pdf ThursdayG4.pdf FridayG1.pdf FridayG2.pdf FridayG3.pdf FridayG4.pdf

Garden Waste (Brown Bin)

Garden Waste A.pdf Garden Waste B.pdf Garden Waste C.pdf Garden Waste D.pdf Garden Waste E.pdf Garden Waste F.pdf Garden Waste G.pdf Garden Waste H.pdf Garden Waste I.pdf Garden Waste J.pdf

Verification

roberthunt commented 11 months ago

2023 / 2024

New calendars.

Household (Black Bin) / Glass (Green Box) / Recycling (Green Bin)

MondayG1.pdf MondayG2.pdf MondayG3.pdf MondayG4.pdf TuesdayG1.pdf TuesdayG2.pdf TuesdayG3.pdf TuesdayG4.pdf WednesdayG1.pdf WednesdayG2.pdf WednesdayG3.pdf WednesdayG4.pdf ThursdayG1.pdf ThursdayG2.pdf ThursdayG3.pdf ThursdayG4.pdf FridayG1.pdf FridayG2.pdf FridayG3.pdf FridayG4.pdf

Garden Waste (Brown Bin)

Garden Waste A.pdf Garden Waste B.pdf Garden Waste C.pdf Garden Waste D.pdf Garden Waste E.pdf Garden Waste F.pdf Garden Waste G.pdf Garden Waste H.pdf Garden Waste I.pdf Garden Waste J.pdf

sym0nd0 commented 11 months ago

If this one can be scraped and included in this project, I'll be over the moon.

I've been battling with this nonsensical way of this data being provided and have asked Gedling on multiple occasions to either provide a simple list of dates or, better yet, an API for it but to no avail.

dp247 commented 11 months ago

No promises but... have you got the URLs for those calendars 😉

roberthunt commented 11 months ago

Yes, keep in mind they seem to re-use the URLs from year to year so they have recently swapped over to delivering the 2023/2024 calendar now. The files for last year are above though by way of reference in how they may change.

2023/2024

Household (Black Bin) / Glass (Green Box) / Recycling (Green Bin)

https://apps.gedling.gov.uk/refuse/data/MondayG1.pdf https://apps.gedling.gov.uk/refuse/data/MondayG2.pdf https://apps.gedling.gov.uk/refuse/data/MondayG3.pdf https://apps.gedling.gov.uk/refuse/data/MondayG4.pdf https://apps.gedling.gov.uk/refuse/data/TuesdayG1.pdf https://apps.gedling.gov.uk/refuse/data/TuesdayG2.pdf https://apps.gedling.gov.uk/refuse/data/TuesdayG3.pdf https://apps.gedling.gov.uk/refuse/data/TuesdayG4.pdf https://apps.gedling.gov.uk/refuse/data/WednesdayG1.pdf https://apps.gedling.gov.uk/refuse/data/WednesdayG2.pdf https://apps.gedling.gov.uk/refuse/data/WednesdayG3.pdf https://apps.gedling.gov.uk/refuse/data/WednesdayG4.pdf https://apps.gedling.gov.uk/refuse/data/ThursdayG1.pdf https://apps.gedling.gov.uk/refuse/data/ThursdayG2.pdf https://apps.gedling.gov.uk/refuse/data/ThursdayG3.pdf https://apps.gedling.gov.uk/refuse/data/ThursdayG4.pdf https://apps.gedling.gov.uk/refuse/data/FridayG1.pdf https://apps.gedling.gov.uk/refuse/data/FridayG2.pdf https://apps.gedling.gov.uk/refuse/data/FridayG3.pdf https://apps.gedling.gov.uk/refuse/data/FridayG4.pdf

Garden Waste (Brown Bin)

https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20A.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20B.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20C.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20D.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20E.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20F.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20G.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20H.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20I.pdf https://apps.gedling.gov.uk/GDW/Rounds/data/Garden%20Waste%20J.pdf

dp247 commented 11 months ago

Cheers. I've also sent an FOI request to the council for their data, so we may have two ways to go about it.

sym0nd0 commented 11 months ago

Cheers. I've also sent an FOI request to the council for their data, so we may have two ways to go about it.

That is genius! 😂

dp247 commented 11 months ago

I got a response... they sent me PDF files

sym0nd0 commented 11 months ago

The joy of dealing with Gedling. 😂

When asked about API access, following their email alerts recently falling over and either sending people notifications for the wrong bin to be collected (even different bins to different individuals subscribed from the same house 🤦🏼‍♂️) or no email at all they've said

we are looking at options including an easier interface but, for now, we will continue with the email alerts, we're just having a few issues since we moved to a new system.

robbrad commented 11 months ago

The joy of dealing with Gedling. 😂

When asked about API access, following their email alerts recently falling over and either sending people notifications for the wrong bin to be collected (even different bins to different individuals subscribed from the same house 🤦🏼‍♂️) or no email at all they've said

we are looking at options including an easier interface but, for now, we will continue with the email alerts, we're just having a few issues since we moved to a new system.

Tell them they are welcome to open a pull request on this GitHub repository as an option.

robbrad commented 11 months ago

If we do decide to do something funky with the PDFs - please keep in mind

https://github.com/robbrad/UKBinCollectionData/issues/493#issuecomment-1859126983

skelt0 commented 11 months ago

I know it's not very 'smart' but would it be a half way house if we were just to hard code the data? You could still use the address lookup to check the right lookup data. It would mean once a year someone would have to grab the data and put it in a sensible format so someone can submit the changes. I'm happy to write the first version up - @roberthunt would you be happy checking in on this yearly to update the data or create a request for someone else to do it?

I know it's non ideal, but anything else seems like it'll either take much longer or not happen at all. And at least it gives the poor folk of Gedling HA interation?

@robbrad ?

robbrad commented 11 months ago

I'm okay with that. I know it's less than ideal, but can it be a JSON dictionary in the Python council file? The only reason I say this is if we start having extra files in the repository, it dilutes the structure we currently have. What do you think, @skelt0 ?

skelt0 commented 11 months ago

Yeah ok! I'll see what I can pull together!

robbrad commented 11 months ago

This may or may not help you get the data out https://github.com/pymupdf/PyMuPDF

Other option if there is someway to go PDF to HTML then extract the data rather than hand typing it

skelt0 commented 11 months ago

@robbrad - Check out an initial stab at this: https://github.com/skelt0/UKBinCollectionData/blob/feat-gedling-borough-council/uk_bin_collection/uk_bin_collection/councils/GedlingBoroughCouncil.py

The calendar data is pretty predictable as mentioned somewhere above so i've made a helper script to generate the dates based on three values. It makes it a million times quicker. I wouldn't like to predict that this predictive modelling will work in the future years though (even/odd weeks, and 1 in 4 for glass) so 50/50 on whether I save the helper script somewhere.

Let me know what you think and I can continue putting the data in. Currently the link above works for the supplied street's refuse data (Black, Blue and Glass bins).

Note: This isn't tidied up yet and the address is currently hardcoded.

robbrad commented 11 months ago

Looking good!

skelt0 commented 10 months ago

@roberthunt - can you let me know how you get on with this? It's hand entered - i've tried to match all the changes due to bank holidays.

Also - I was wondering if the FoI process could be repeated - but asking for an accessible (for screen reader) version of the data? Surely they must need to supply this data in an accessible format when requested?

Anyway - hope this works out!

jamesmacwhite commented 5 months ago

If it helps, I've converted the horrible PDFs into the iCal format and hosted them for use, as I already did this with my own schedule Wednesday G2. The schedules generally follow a consistent schedule with the exception of bank holidays being identified as changed collection days.

https://github.com/jamesmacwhite/gedling-borough-council-bin-calendars

If you want to argue the case on legal grounds, all councils fall under the Public Sector Bodies Websites and Mobile Applications (No. 2) Accessibility Regulations 2018 act, they are legally required to make content accessible. The fact the calendars provided were created after 2018, would mean they would be required to provide an alternative format. If you want to push the issue, they are technically not meeting accessibility regulations with the formats provided.

sym0nd0 commented 5 months ago

James, you're a superstar! Thanks for doing that and for sharing.

jamesmacwhite commented 5 months ago

No worries! It was great to come across this project and that it exists to create an API layer when there is none. Unfortunately for Gedling Borough Council, the PDFs are the only data source outside of the email reminder service, but while the email service is better accessibility wise given it's HTML, this does not provide full schedule data, so it's either PDF or nothing, which is horrible and borderline on their accessibility statement as referenced.

You could in theory trigger an automation on the email reminder being received and parse out the data from that. The consistent properties like the sender from or subject are available.

From: GBC Bin Reminder Alert <news@comms.gedling.gov.uk>
Subject: We're collecting your bin tomorrow, please it out by 6am

The heading which contains the bin type is under a <h2> element but does not have a specific ID, there are also two <h2> elements, so you'd have to take the first occurrence and then parse our the all caps text as that's what they use for bin type.

image

I looked at this orginally, but by the time you've looked at the email automation/HTML scraping side of things with the fragile nature of DOM/HTML parsing and the fact the Garden Waste Collection service is completely outside of this, just converting all to iCal seems easier and at least reliable, providing the occurrence scheduling aligns to the original PDF, so that's what I ended up doing after seeing a few others around home automation having the same issues with Gedling. Who knew Gedling has 20 different bin collections!

We should still push Gedling Borough Council to look at this though long term, the PDFs themselves have and always will be print documents, which Gedling won't actually print anymore anyway due to cost/sustainability, so the format in my view is outdated. Clearly, if the email reminder service exists, they have some form of scheduling system behind the scenes, so it doesn't seem to far to publish official iCal calendars.

jamesmacwhite commented 5 months ago

I've also mocked up a web page with all the iCal links for easy reference as well: https://jamesmacwhite.github.io/gedling-borough-council-bin-calendars/. I'm not going to go as far as buy a domain name for the site, but a static Jekyll site should make it easier, rather than messing around with the Raw button on GitHub.

sym0nd0 commented 5 months ago

Love that! Thanks again for your work on this, made my life a lot easier.

jamesmacwhite commented 5 months ago

You're welcome. HTML and JSON formats are also provided, making the data more accessible and open!

jamesmacwhite commented 4 months ago

Since #763 was merged, this project now leverages API data from gbcbincalendars.co.uk removing the static issue. There is still the requirement to create iCal data for each calendar each year, but this should have a lower maintenance burden, given using calendar occurrences, allows this to be done without individually listing every single date occurrence manually. The JSON data is expanded to provide the collection dates in full, which is generated from RRULE iCal data.

robbrad commented 4 months ago

Do we need to capture this process in the wiki at all?

And may I say, fabulous work @jamesmacwhite

jamesmacwhite commented 4 months ago

Thanks. Glad it can be of use to other projects!

jamesmacwhite commented 4 months ago

One thing for your wiki you might want to highlight. There's at least one case where a valid street name only returns data for one type of collection and not both. Odd right? Not sure how that's valid to be honest. I doubled checked this at the source and confirmed it's an oddity with Gedling's data.

Using Beswick Close as the example.

No refuse data is returned, yet it does have garden collection data.

I've confirmed Beswick Close is within the Gedling boundary, but that's not really a surprise when clearly you can have a garden collection calendar!

I happened to come across this as there's some Google Analytics tracking on searches, and I cross check some searches locally just to ensure they are returning data correctly and this is one that discovered this kind of scenario is possible. More Gedling fun. I updated my own search tool to handle the scenario. The API response of an empty array for collections with no data is valid, but I guess I never expected this to occur for just one type.

image

My suspicion is that it's due to being a relatively new built area in the past two years it could possibly be a data lag.