joachimesque / humanstxt-webextension

Chrome extension to show humans.txt for current website
https://addons.mozilla.org/fr/firefox/addon/humans-txt-webextension/
MIT License
6 stars 1 forks source link

Major refactor #4

Open mcorossigo opened 4 years ago

mcorossigo commented 4 years ago

Hello @joachimesque Here the promised PR. Unluckily I got a little deeper in the rabbit hole, and ended up with quite some changes. I hope to confince you of their effectiveness.

The PR feature a variety of tecnical improvements, like explicit http cache management (For ref: https://developer.mozilla.org/en-US/docs/Web/API/Request/cache), slighlty improved ES6 syntax, handling of non strictly plain text and lazy loading of the humans.txt content.

The whole dictionary based cache has been swifted from a per tab to a per domain management. This because, quoting the official website:

Where is it located? In the site root. Just next to the robots.txt file.

Now the extension works like this: For every new domain visited an ultra light (and cached) HEAD request (for ref: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD) is sent to check for the existance of humans.txt. If found, the icon is shown.

If and only if the icon is clicked a GET request is made to retrieve the file to be shown.

[placeholder for the continuation..]

There are still some quirks and problem, but this is the road I suggest. Until I fix all the remaining problems, please take a look and share your opinion.

All the code has been made to keep compatibility with both Firefox and Chrome

mcorossigo commented 4 years ago

Hello @joachimesque Sorry for the hyatus, studies kept me busy.

There are still some quirks and problem,

This has been finally fixed, and the main behavior is now consistent and effective.

There are a couple of consideration you should keep in mind (and maybe want to address):

  1. I added a quite a bit of logging, that can pollute the extension developer console. Nice for debugging, but you may want to remove all of them.

  2. I don't cache negative responses in the humansByDomain object, so a request always start on domains without humans.txt. The network console shows that such request is forcefully cached by the browser so minimal overhead is added. Still you may want to change this.

  3. humansByDomain does not have limit in number or size, so if the user navigate to an insane amount of different domains and all of them have a humans.txt it can become quite a memory hog.

  4. In respect of the original extension, I removed the caching of the rendered file, since now it is only computed on user click.

At this point the PR is mergeble as I tested it both on Chrome and Firefox, still I would suggest you to do the same, for good measure ;)

mcorossigo commented 4 years ago

Gentle ping for @joachimesque Have you been able to checkout the new changes?

joachimesque commented 4 years ago

Hi @mcorossigo, thanks a lot ! I'm sorry I haven't had time to review your code earlier.

  1. I added a quite a bit of logging, that can pollute the extension developer console. Nice for debugging, but you may want to remove all of them.

We could remove them altogether, or we could add an option do display them in the logs (via a sort of "dev mode" in options.html and devMode && console.warn("Error loading " + url + ": invalid content type")), what do you think?

  1. I don't cache negative responses in the humansByDomain object, so a request always start on domains without humans.txt. The network console shows that such request is forcefully cached by the browser so minimal overhead is added. Still you may want to change this.

Caching negative responses wouldn't allow domains to add a humans.txt file—I mean, there should be a time limit for that cache. Perhaps a variable that only lasts whild the browser runs? What do you think?

  1. humansByDomain does not have limit in number or size, so if the user navigate to an insane amount of different domains and all of them have a humans.txt it can become quite a memory hog.

This is the same (but opposite) problem as 2., the positive domains cache could have a limit. Would it be better to limit the number of domains, or to set an expiry date? I have to admit I don't know which solution is the most practical/best.


We can merge and publish the extension as-is, but I'd like your opinion on the caching of domains, positive or negative.

joachimesque commented 4 years ago

I've started working on a dev_mode option: https://github.com/mcorossigo/humanstxt-webextension/pull/1

Tell me what you think!

joachimesque commented 4 years ago

On Chrome there's also an error for chrome:// URIs. I think a check for http at the start of the request protocol would help.

mcorossigo commented 4 years ago

We could remove them altogether, or we could add an option do display them in the logs (via a sort of "dev mode" in options.html and devMode && console.warn("Error loading " + url + ": invalid content type")), what do you think?

I've started working on a dev_mode option: mcorossigo#1

In the KISS spirit, I was of the opinion that adding an option for this seems a little of an overkill, but better safe that sorry. Also, after a rapid review of your PR I say LGTM 👍

Caching negative responses wouldn't allow domains to add a humans.txt file

Right now the network cache is handled by the browser policy, which can be cleared both manually by the user and by the browser when hitting space limits. I could change force-cache to default behaviour, that follows this algorithm:

If there is a match and it is fresh, it will be returned from the cache. If there is a match but it is stale, the browser will make a conditional request to the remote server. If the server indicates that the resource has not changed, it will be returned from the cache. Otherwise the resource will be downloaded from the server and the cache will be updated. If there is no match, the browser will make a normal request, and will update the cache with the downloaded resource.

Considering that cache (unlike cookies) is not meant to be permanent, the current solution seems already a good compromise.

This is the same (but opposite) problem as 2., the positive domains cache could have a limit. Would it be better to limit the number of domains, or to set an expiry date? I have to admit I don't know which solution is the most practical/best.

The cache here is not the native browser but a simple js object whose objective is to avoid all the request and disk read/write that the network cache would cause. It is also unavoidably cleared by closing and reopening the browser.

A limit on the number can be easily implemented, but since this is such a volatile cache I think the gain would be minimal (a browser can easily handle javascript object with thousand of simple string elements).

On Chrome there's also an error for chrome:// URIs. I think a check for http at the start of the request protocol would help.

Also in Firefox it errors for about: URIs and all other blacklisted websites (both Chrome and Firefox have their own list). Even this is easily fixable.