mampfes / hacs_waste_collection_schedule

Home Assistant integration framework for (garbage collection) schedules
MIT License
1.05k stars 650 forks source link

[Source Request]: Minneapolis, MN #2467

Open panoptican opened 2 months ago

panoptican commented 2 months ago

Municipality / Region

Minneapolis, MN

Collection Calendar Webpage

https://apps.ci.minneapolis.mn.us/AddressPortalApp/Search?AppID=RecycleFinderApp

Example Address

641 University Ave NE, Minneapolis, MN 55413

Collection Data Format

Something different (add to additional information)

Additional Information

City of Minneapolis publishes their schedule as an RSS feed. The tricky part is that there are two schedules and the one that a person is on depends on their specific location in the city. Rather than try to determine which schedule to use, I propose that there are basically just two sources added, one for each Minneapolis schedule variant. Those schedules are published as RSS and don't require any parameters to access. Here they are:

Schedule ABE https://apps.ci.minneapolis.mn.us/CalendarApp/Ex_CalendarRSS.aspx?linkurl=http://www.ci.minneapolis.mn.us/government/calendars.asp&datebook=Garbage%20and%20Recycling%20Monday%20Route%20ABE&type=rss

Schedule CD https://apps.ci.minneapolis.mn.us/CalendarApp/Ex_CalendarRSS.aspx?linkurl=http://www.ci.minneapolis.mn.us/government/calendars.asp&datebook=Garbage%20and%20Recycling%20Monday%20Route%20CD&type=rss

5ila5 commented 2 months ago

I doubt there is anything I can do here as they use Cloudflare CAPTCHAs to prevent scraping (even on the calendar RSS feed which only purpose is to be easily readable by computers)

panoptican commented 2 months ago

I doubt there is anything I can do here as they use Cloudflare CAPTCHAs to prevent scraping (even on the calendar RSS feed which only purpose is to be easily readable by computers)

huh!? bizarre. I didn't even know you could lock RSS behind CAPTCHAs. definitely defeats the entire purpose of an RSS feed. are there any sources currently that use RSS that I could use as an example. maybe I'll just implement it for myself. a little guidance would help though.

5ila5 commented 2 months ago

You're probably best off using the static source

You could try to write them an email, maybe they could remove the CAPTCHA before the RSS feed.

A source using the RSS data would probably look like this (without the hard-coded RSS data, of course)

from waste_collection_schedule import Collection  
from bs4 import BeautifulSoup  
from dateutil.parser import parse
import re

TEST_CASES = {
    "Minneapolis": {},
}

TITLE = "Minneapolis"
DESCRIPTION = "Source for Minneapolis."
URL = "minneapolismn.gov"

DATE_REGEX = re.compile(r"(\d{1,2})/(\d{1,2})/(\d{2,4})")
ICON_MAP = {
    "REFUSE": "mdi:trash-can",
    "RECYCLE": "mdi:recycle",
    "RECYCLING": "mdi:recycle",
    "FOOD": "mdi:leaf",
    "GARDEN": "mdi:leaf",
}

class Source:
    def __init__(self) -> None:
        pass

    def fetch(self):
        xml = """<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Garbage and Recycling Monday Route ABE</title><link>http://apps.ci.minneapolis.mn.us/CalendarApp/Ex_Monthly.aspx?linkname=Calendar+Home&amp;linkurl=&amp;datebook=12&amp;view=monthly</link><description>Recycling Monday ABE</description><language>en-us</language><pubDate>Sun, 18 Aug 2024 06:18:21 GMT</pubDate><image><title>Garbage and Recycling Monday Route ABE</title><url>http://wwwdocs.minneapolismn.gov/images/rss14x14.png</url><link>http://apps.ci.minneapolis.mn.us/CalendarApp/Ex_Monthly.aspx?linkname=Calendar+Home&amp;linkurl=&amp;datebook=12&amp;view=monthly</link><description>Minneapolis - City of Lakes</description></image><webMaster>mailto:egovernment@ci.minneapolis.mn.us</webMaster><lastBuildDate>Sun, 18 Aug 2024 06:18:21 GMT</lastBuildDate><item><title>Garbage Day 8/19/24 6:00 AM</title><description>Place securely bagged garbage inside your garbage cart. </description><link>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</link><pubDate>Mon, 19 Aug 2024 06:00:00 GMT</pubDate><comments>Place your garbage cart at the alley or curb line by 6 a.m.</comments><guid>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</guid></item><item><title>Organics Calendar (Monday) 8/19/24 6:00 AM</title><description>Organics must be bagged in certified compostable plastic bags, paper bags, or securely wrapped with newspaper. Place bagged or wrapped organics inside your organics recycling cart.  Paper egg cartons and pizza boxes from delivery do not need to be bagged.
</description><link>http://www.minneapolismn.gov/organics</link><pubDate>Mon, 19 Aug 2024 06:00:00 GMT</pubDate><comments>Place your organics recycling cart at the alley or curb line by 6 a.m.</comments><guid>http://www.minneapolismn.gov/organics</guid></item><item><title>Garbage Day 8/26/24 6:00 AM</title><description>Place securely bagged garbage inside your garbage cart. </description><link>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</link><pubDate>Mon, 26 Aug 2024 06:00:00 GMT</pubDate><comments>Place your garbage cart at the alley or curb line by 6 a.m.</comments><guid>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</guid></item><item><title>Organics Calendar (Monday) 8/26/24 6:00 AM</title><description>Organics must be bagged in certified compostable plastic bags, paper bags, or securely wrapped with newspaper. Place bagged or wrapped organics inside your organics recycling cart.  Paper egg cartons and pizza boxes from delivery do not need to be bagged.
</description><link>http://www.minneapolismn.gov/organics</link><pubDate>Mon, 26 Aug 2024 06:00:00 GMT</pubDate><comments>Place your organics recycling cart at the alley or curb line by 6 a.m.</comments><guid>http://www.minneapolismn.gov/organics</guid></item><item><title>Recycling Day 8/26/24 6:00 AM</title><description>Recycling should be placed loose inside your recycling cart. Please empty recycling from bags, and flatten all boxes.  </description><link>www.minneapolismn.gov/recycling</link><pubDate>Mon, 26 Aug 2024 06:00:00 GMT</pubDate><comments>Place your blue one-sort recycling cart at the alley or curb line by 6 a.m. </comments><guid>www.minneapolismn.gov/recycling</guid></item><item><title>Garbage Day 9/3/24 6:00 AM</title><description>Collection is delayed one day due to the Labor Day holiday. Place securely bagged garbage inside your garbage cart. </description><link>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</link><pubDate>Tue, 03 Sep 2024 06:00:00 GMT</pubDate><comments>Place your garbage cart at the alley or curb line by 6 a.m.</comments><guid>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</guid></item><item><title>Organics Calendar (Monday) 9/3/24 6:00 AM</title><description>Collection is delayed one day due to the Labor day holiday. 
Organics must be bagged in certified compostable plastic bags, paper bags, or securely wrapped with newspaper. Place bagged or wrapped organics inside your organics recycling cart.  Paper egg cartons and pizza boxes from delivery do not need to be bagged.
</description><link>http://www.minneapolismn.gov/organics</link><pubDate>Tue, 03 Sep 2024 06:00:00 GMT</pubDate><comments>Place your organics recycling cart at the alley or curb line by 6 a.m.</comments><guid>http://www.minneapolismn.gov/organics</guid></item><item><title>Garbage Day 9/9/24 6:00 AM</title><description>Place securely bagged garbage inside your garbage cart. </description><link>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</link><pubDate>Mon, 09 Sep 2024 06:00:00 GMT</pubDate><comments>Place your garbage cart at the alley or curb line by 6 a.m.</comments><guid>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</guid></item><item><title>Organics Calendar (Monday) 9/9/24 6:00 AM</title><description>Organics must be bagged in certified compostable plastic bags, paper bags, or securely wrapped with newspaper. Place bagged or wrapped organics inside your organics recycling cart.  Paper egg cartons and pizza boxes from delivery do not need to be bagged.
</description><link>http://www.minneapolismn.gov/organics</link><pubDate>Mon, 09 Sep 2024 06:00:00 GMT</pubDate><comments>Place your organics recycling cart at the alley or curb line by 6 a.m.</comments><guid>http://www.minneapolismn.gov/organics</guid></item><item><title>Recycling Day 9/9/24 6:00 AM</title><description>Recycling should be placed loose inside your recycling cart. Please empty recycling from bags, and flatten all boxes.  </description><link>www.minneapolismn.gov/recycling</link><pubDate>Mon, 09 Sep 2024 06:00:00 GMT</pubDate><comments>Place your blue one-sort recycling cart at the alley or curb line by 6 a.m. </comments><guid>www.minneapolismn.gov/recycling</guid></item><item><title>Garbage Day 9/16/24 6:00 AM</title><description>Place securely bagged garbage inside your garbage cart. </description><link>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</link><pubDate>Mon, 16 Sep 2024 06:00:00 GMT</pubDate><comments>Place your garbage cart at the alley or curb line by 6 a.m.</comments><guid>https://www.minneapolismn.gov/resident-services/garbage-recycling-cleanup/</guid></item><item><title>Organics Calendar (Monday) 9/16/24 6:00 AM</title><description>Organics must be bagged in certified compostable plastic bags, paper bags, or securely wrapped with newspaper. Place bagged or wrapped organics inside your organics recycling cart.  Paper egg cartons and pizza boxes from delivery do not need to be bagged.
</description><link>http://www.minneapolismn.gov/organics</link><pubDate>Mon, 16 Sep 2024 06:00:00 GMT</pubDate><comments>Place your organics recycling cart at the alley or curb line by 6 a.m.</comments><guid>http://www.minneapolismn.gov/organics</guid></item></channel></rss>"""

        soup = BeautifulSoup(xml, "xml")
        entires = []
        for title in soup.select("item title"):
            date_result = DATE_REGEX.search(title.get_text())
            if not date_result:
                print("No date found in", title.get_text())
                continue
            date_str = date_result.group(0)
            date = parse(date_str).date()

            bin_type = title.get_text().split(date_str)[0].strip()
            icon = ICON_MAP.get(bin_type.upper(),)

            entires.append(Collection(
                date=date,
                t=bin_type,
                icon=icon,
            ))
        return entires
panoptican commented 2 months ago

I wrote them an email and will update if I get a response. In the meantime, static source looks like a good fit. Didn't notice about those.