openzim / sotoki

StackExchange websites to ZIM scraper
https://library.kiwix.org/?category=stack_exchange
GNU General Public License v3.0
217 stars 25 forks source link

Conditionnaly include highlight JS #227

Open rgaudin opened 3 years ago

rgaudin commented 3 years ago

Currently, we include highlight JS (and the stack.js stacks that's required to configure it) for all domains. On SE, this is only included for a number of domains.

We should make a request to the homepage of the online domain to find out if it should be enabled or not.

yekanchi commented 1 year ago

seems like this is not working anymore. no code is highlighted.

kelson42 commented 1 year ago

@yekanchi You have a precise example in mind on a recent ZIM file?

yekanchi commented 1 year ago

@yekanchi You have a precise example in mind on a recent ZIM file?

As I tested locally and it's the same on the library.kiwix.org, the code fragments of the questions and answers are not highlighted.

kelson42 commented 1 year ago

@yekanchi Which URL for example?

yekanchi commented 1 year ago

@yekanchi Which URL for example?

https://library.kiwix.org/content/stackoverflow.com_en_all_2022-11/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

the code is not highilted

kelson42 commented 1 year ago

To the contrary to upstream article which looks good https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

rgaudin commented 1 year ago

Indeed the method used to detect whether a stackoverflow domain uses highlighting or not doesn't work anymore. I'll look for an alternative but maybe it's easier and more future-proof to just enable it unconditionally.

The other conditional feature (mathjax) still works though.

kelson42 commented 1 year ago

@rgaudin Or create an automated test to secure it works fine still.

rgaudin commented 1 year ago

@rgaudin Or create an automated test to secure it works fine still.

Not quite adapted in this case I think.

We're looking on online source for a clue informing whether highlight was loaded or not (a CSS class). Testing this would mean recording which one is expected to have it, defeating that dynamic behavior. Or we'd do it only on some and not all which would be hazardous given the large number of SE domains.

Also, what happened is a code change online, yet we had no update of the code base for a long time. An action would only break on code change. Or it would have to be scheduled periodic check, which we don't really need otherwise.

I think strengthening the detection by looking for a better source of information is more important but it's not exclusive and a test would help.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.