guilryder / chrome-extensions

Chrome extensions
MIT License
79 stars 28 forks source link

Extract subdomains #8

Open max-len opened 2 years ago

max-len commented 2 years ago

Would it be possible to show subdomains, i.e. extract them from the hostname or just extract arbitrary regex from the variables?

The tab title format could look like {subdomain[x]} which could then be parsed and used with document.location.hostname.match(/[^\.]+/g).reverse()[x], where {subdomain[0]} would represent the TLD.

Alternatively a tab title format could look like {hostname:<regex>} which would return the matches / matching groups.

Using multiple tags is already possible.

guilryder commented 2 years ago

Several users have requested a feature like this in the past e.g. a tag to remove the domain suffix and/or sub-domains.

Extracting subdomains is difficult in the general case because of second-level domains like .co.uk. {subdomain[0]} would be the TLD (but not necessarily the full suffix that people often have in mind when they say "TLD"), and {subdomain[1]} would not necessarily be the "core domain name" (it could be just "co" and {subdomain[2]} would be the core domain name).

In theory it's possible to use the public suffix list to extract the full suffix, then determine the main domain and any sub-domain(s). Unfortunately Chrome does not provide it as a standard API. The list is long (> 11,000 entries) and changes often, so the extension would have to re-download it regularly instead of simply packaging a static file. That's a significant amount of complexity I'm still reluctant to add. The somewhat standard JS implementation npm package is lagging behind the official list by 2 years and has open bugs that worry me a bit.

Extracting regex groups from variables is less complex and a long regex that hard-codes popular suffixes can mitigate the general problem without overpromising. That said it's not trivial to implement: tag parsing will no longer be a simple, static search-replace but require some error-prone parsing logic to handle all the edge cases (avoid matching tags inside the regex, allow a second level of escaping to support regexes that contain }, deal with regex errors). I will get to it if / when there's more demand.

Crissov commented 1 year ago

Those special second-level domains come in a very limited set. They either use the classic IANA ones with three letters each or, like .uk, an extended set with two or three letters each:

For an example domain like www.shop.company.co.uk, you could offer:

guilryder commented 1 year ago

Those special second-level domains come in a very limited set.

https://publicsuffix.org/list/public_suffix_list.dat contains 140 entries shaped like com.suffix, so the set doesn't seem that small. I would rather not have to personally judge which entries are well-known enough to be hard-coded in a regexp.

I'm curious about what needs require to parse domain names in a more granular and advanced way than the {hostname} tag. It's clear that {hostname} is simplistic and doesn't produce aesthetically optimal results in various situations. But it's straightforward and seems good enough to my naïve understanding. What functional, non-cosmetic benefits would justify adding complexity and maintenance cost to the extension to better parse hostnames, extract ISO language/country codes, etc.?

(Background: I personally use the extension for compatibility with KeePass, where matching the full domain name is important for security. KeePass also supports * to ignore domain prefixes/suffixes.)

max-len commented 1 year ago

Our use case:

Operating a cloud infrastructure with ~20 regions, there are many instances of similar tools running in every one of them. Per default they will just display the tool's name in the title (Prometheus, Grafana, Kibana etc.), which makes several browser tabs of different instances indistinguishable.

This extension displaying the hostname is helpful but very limited to the URL length and number of opened tabs, currently the best approach is to hover over them to get the hostname displayed. The ability to extract certain parts of the URL (not the path though) to be displayed as tab title would be a huge boost in productivity and comfort.

This use case doesn't require any TLD semantics, as these URLs are internal with consistent subdomain format.

Example: For a URL <tool_name_can_be_long>.<shard>.constant.<region>.cloud.corp/..., the tab title should display <region>:<shard> (The tool itself can be identified by the favicon).

Ideally it would be possible to extract substrings of the tool name too.

AITEK-DEV commented 1 year ago

I think that's a great idea! It would definitely be a productivity boost to be able to easily distinguish between different instances of the same tool in different regions. The example you provided is clear and concise, and I think it would be a great way to implement this feature.

I think the most important thing would be to make sure that the extracted parts of the URL are meaningful and easy to understand. For example, the shard number might not be as important as the region, so you might want to prioritize that in the tab title.

Another thing to consider is the length of the tab title. If it's too long, it might start to get truncated or cut off, which would defeat the purpose of the feature. So you might want to put a limit on the number of characters that can be included in the title.

Overall, I think this is a great idea, and I would love to see it implemented in a browser extension. It would be a huge help for anyone who is managing a cloud infrastructure with multiple regions.

example of how the browser extension could be implemented

// The extension is configured to extract the tool name, region, and shard from the URL. // The extension also allows users to customize the format of the tab title.

const extension = new BrowserExtension();

extension.onTabTitleChanged = (tab) => { const url = tab.url; const toolName = url.substring(url.indexOf("/") + 1); const region = url.substring(url.indexOf(".") + 1, url.indexOf(".", url.indexOf(".") + 1)); const shard = url.substring(url.indexOf(":") + 1);

const tabTitle = ${toolName}.${region}.${shard}; tab.title = tabTitle; };

extension.start();

This code would extract the tool name, region, and shard from the URL and set the tab title to the corresponding value. The user could also customize the format of the tab title by changing the variable.tabTitle

For example, if the user wanted to include the hostname in the tab title, they could change the code to

const tabTitle = ${toolName}.${region}.${shard} (${hostname});

This would include the hostname in the tab title, which would make it even easier to distinguish between different instances of the same tool in different regions.