john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.84k stars 210 forks source link

Request: Could scheme be made available? #205

Closed JeffreyShran closed 1 year ago

JeffreyShran commented 4 years ago

Hi,

I'm aware of https://github.com/john-kurkowski/tldextract/issues/6 which asked for some urlparse functionality. This isn't that. It's simply a request to include access to the scheme as part of the standard output.

So instead of:

tldextract.extract('https://blogs.google.com')
# ExtractResult(subdomain='blogs', domain='google', tld='com')

Do this instead:

tldextract.extract('https://blogs.google.com')
# ExtractResult(scheme='https', subdomain='blogs', domain='google', tld='com')
john-kurkowski commented 4 years ago

Possible. We're already parsing the scheme. We're considering moving away from namedtuple, to track additional metadata, like in #138. If we make that break, then adding the scheme wouldn't break this library's expected, ordered triple of (subdomain, domain, tld).

JeffreyShran commented 4 years ago

Thanks @john-kurkowski.

Yes, if you could please keep in the back of your mind that would be great.

john-kurkowski commented 1 year ago

https://github.com/john-kurkowski/tldextract/issues/272#issuecomment-1267491864