noyainrain / flatdir

Web aggregator of flat ads from different real estate companies.
MIT License
24 stars 4 forks source link

Respect base tags #17

Open noyainrain opened 4 months ago

noyainrain commented 4 months ago

If a document defines a base tag, take it into account when resolving extracted ad URLs.

Draft

class Company:
    """
    URL field of an ad. The extracted URL is resolved against the base URL of the document (for HTML
    see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base).
    """
    url_path: str

Implementation hints: Company._parse_html() could search for a <base> tag and join it with each ad URL. A minimal HTML document containing a <base> tag could be used for a unit test.

noyainrain commented 2 months ago

I'm going to implement this live on https://www.twitch.tv/noyainrain next Thursday at 18:00 CET (12 pm EST) :blush:

@l3d00m