lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
MIT License
241 stars 61 forks source link

Support for private/reserved/custom TLDs #154

Open carton-of-mice opened 9 months ago

carton-of-mice commented 9 months ago

URLExtract pulls in the list of TLDs in the root zone, but there are a number of reserved, test or otherwise special use TLDs that are still valid in URLs in at least some contexts. I'm looking at rfc6761, rfc6762 and rfc7686, for which I think there should be an extract_private property, but a more generic solution like allowing callers to define their own supplementary(or replacement) set of custom TLDs would be sufficient as well.