intelligently avoid modifying rich pages served on markdown urls

Quelklef commented 3 years ago

As per our discussion in https://github.com/simov/markdown-viewer/issues/119.

Implementaiton

I ended up going with the implementation I had suggested in the thread. Namely, wait for the dom to load (via document_end) and then check if the page has more than 5 Element nodes on it. While there are cleaner possible ways to check if the page is rich, I chose this one because it not too precise, meaning that it should better work across browsers and browser versions.

Performance

This implementation does incur a delay, because we have to wait for the dom to load in order to check how many element it has in order to decide whether or not to do the actual markdown rendering. However, this delay should be entirely non-problematic:

On pages that are serving raw markdown, the DOM will load near-instantly, so the delay will be near-instant.
On rich pages, the DOM may take a while to load, but then we will decide not to render markdown, anyway, so this delay is fine.

If, for some reason, these performance characteristics are unacceptable, they could be improved by using a MutationObserver to watch the DOM as it is being built instead of waiting for the DOM to finish before checking for the element count. However, browser compatibility may suffer.

Browser compatibility

I tried to write portable code. No let or const, etc. I checked the most possibly-problematic features that I am using on caniuse; they all came out at 96.34%.

Testing

I've tested this on Chrome 88.0.4324.150 and Firefox 86.0, comparing this rich .md url to this raw .md url
- On Chrome, everything worked as expected
- On Firefox, everything worked as expected, except that the raw markdown was not rendered, both with and without my changes
I have not tested this on any other browser, since I don't have any others installed
I have not verified that my changes do not break any existing tests, since a large portion of the test suite fails on my machine even without my changes
I have not added any test cases

Let me know if it needs any changes!

simov commented 3 years ago

Thanks, I like your idea. I'll have a look.

Quelklef commented 3 years ago

Any thoughts on this?

simov commented 3 years ago

Generally I like the idea and your implementation seems to be in the correct place. However, as I mentioned earlier I want to test this for potential regressions in loading/rendering time.

I'm testing those types of things manually by loading different markdown files in size and set of features. The goal is to minimize the time the raw markdown flickers on screen before the html is rendered.

I haven't had the time to test this.

Quelklef commented 3 years ago

Gotcha. I can look into doing that testing myself, if you like. It doesn't sound too difficult (famous last words). My times will differ from yours, of course, but perhaps the before/after will still be meaningful.

If you have a particular set of URLs you typically use, send them over and I will see about doing the testing. (Or, if you have no such list, I can just look for a handful of them myself)

simov commented 3 years ago

The markdown-syntax repo is a good start, but then you also need large documents with lots of images in them, as the images affect certain aspects of the loading as well.

Quelklef commented 3 years ago

Alright! The benchmark results are in! Here's what I did.

I added some code to measure how long the extension waits to run. This was done by setting up a MutationObserver and considering the extension execution "complete" when the mutation observer had not received an event for 1000ms. (Note that this is not necessarily a measurement of the "flash" experienced when using this extension)

For a single markdown document, I measured the performance effects of my PR by comparing the execution time of the extension with my PR's code disabled to execution time with my PR's code enabled.

I did this 10 times per document, on 3 documents. The first document is the README.md for this repo. The second document is a large real-world README.md. The third document is an especially large markdown document which I created specifically for this benchmark.

Results are as follows:	Document	avg exec time (no PR)	avg exec time (yes PR)	difference
this repo	`0.086s`	`0.091s`	`0.005s`	`1.062`
large realworld	`0.142s`	`0.179s`	`0.037s`	`1.260`
large synthetic	`0.424s`	`0.528s`	`0.104s`	`1.245`

Thoughts: on an absolute scale, the delays caused by my code changes are tiny, all less than 1/8th of a second. On a relative scale, they are quite large; over 20% in the second two cases. It's up to you to decide whether these performance characteristics are acceptable. As I mentioned before, I have an idea for a solution using MutationObserver which would probably be more efficient, but possibly have (slightly) worse browser compatability.

Looking forward to your thoughts!

simov commented 3 years ago

That's interesting, however I already have a content-type check here so probably it would be enough to just check for text/html instead of counting dom nodes.

Quelklef commented 3 years ago

... why did you not mention this when I originally proposed this node-counting idea?

Regardless, I guess rejecting text/html would work (on websites that play nice). Do you want me to look into that?

simov commented 1 year ago

Content detection in Markdown Viewer v5.1 was greatly improved. The main issue was caused by incorrectly applying the Header Detection with the Path Matching RegExp using logical OR instead of AND.

That basically resulted in the extension ignoring the content-type for the most part. In v5.1 each allowed origin has its own Header Detection and Path Matching RegExp setup. They are both enabled by default and work together if enabled, meaning using logical AND. Also the Header Detection now includes the text/plain content type.

This should eliminate 99% of the cases of incorrect content detection for rendering.

Thanks again for your work here and have a look at v5.1 and maybe try to break the content detection too :)

simov / markdown-viewer