Moving forward with custom CSS/JS in instances

As we know, PHZH has custom CSS applied to the courses, and we need to find a way to deal with it. We also have several bugs in the PHZH zim due to this. So, after going through the instance courses for some instances including PHZH, I think we can take some paths from here. But before actually discussing them, I think it may be better if we go through the following points (would make it easier to understand the ways we may take) -

The online instance has HTML made at the level of sequential block whereas in the offline version, we use different HTMLs at the vertical level. A sequential block is a collection of vertical blocks + the navbar (at the top) and the next and previous buttons (at the bottom). Navigating to any xblock directly takes us to the nearest sequential block with the respective vertical activated in the online version whereas in the offline version we try to take it to nearest vertical block (there are some bugs right now, but they're solved locally).
There are two ways we can view a xblock, using the xblock URL (or the student_view_url in the JSON), or the LMS URL (lms_web_url in the JSON). The learner sees the LMS version. The scraper currently uses the xblock version. Also, as far as I have observed, xblock URLs are pretty much free from the extra CSS and JS. On the other hand, some instances like PHZH have put extra custom CSS and JS in the LMS version (which is the cause of most bugs pointed out by @Popolechien in other issues at the moment). This CSS and JS, as far as I have observed are present in the following places -
1. Extra CSS in the headers (in LMS version and in xblock version) - #71 and #79 are due to these missing in the ZIM
2. Extra JS at the end of the body ( in LMS version only) - #82 is due to this missing in the ZIM
3. Extra classes on the divs with the default class vert(in LMS version only) - The extra CSS for these classes are defined in the CSS in the headers. These are not applied on xblocks version as individual low level xblocks are not wrapped in the vert div in the xblocks view, and the vert div on the xblocks view for higher level xblocks like sequential or vertical do not have those extra classes. #74, #80 and #84 are due to missing extra classes.
The xblocks and the LMS view HTML is the same after the div containing the class xblock for low level xblocks only. But the CSS doesn't show the effect it seems that the CSS on the vert div does the trick (at lest for the problem xblock). Here's an example of extra class for CSS on a vert div -
```
<div class="vert vert-2 frigg-problem-1 frigg-dark" data-id="block-v1:PHZH+WI-B+2019_E+type@problem+block@550d15ad0ec74d778d5e0e369397b897"></div>
```
The xblock version has the same div without the extra frigg* classes for the higher level xblocks. For lower level xblocks, this div is not at all present. Also the xblock div is contained in this div.

So, coming to the ways we can solve the problem, I think we can proceed with this in the following ways - Option 1: Scraping from the LMS web URLs The scraper currently scrapes the data from the individual xblock URLs, which at times do not contain the custom CSS and JS from the instance. So, the first method that comes into the mind is to scrape the LMS urls instead of the xblock URLs. The pros and cons of this method would be as follows -

Pros -

Faster scraping as we do not scrape from xblocks individually and this would hugely cut down the requests that we make to the openedx instance.
We can get near perfect look if we follow this, at least for the page

Cons -

This requires changing the whole xblock extraction system + the xblock extractor objects creation in the scraper, and that's lots of changes
Implementing custom HTML parsing as we would be replacing the divs with xblock classes with our custom HTML, which of course would be based on the original content
We would still, in spite of the efforts need to fallback to generic output for xblocks like the video xblock

Option 2: Keep the current system but make it add extra CSS and JS So, in this method what we can do is have the xblocks HTML extracted from the xblocks URL itself, but detect certain parts of extra CSS and JS to be added to the templates automatically. So, this would basically involve extra calls to the LMS web URL at the vertical xblock level and add scrape headers for CSS, and the end part of the body for JS. However, it seems that there are also certain portions of the div with class vert in the LMS HTML which contain extra CSS classes that apply to their inner HTML. So we also need to copy that. Another thing that's required here is a blacklist of CSS files to not download as we may have that already, for instance, say MathJax. The pros and cons of going with this approach would be as follows -

Pros -

Comparatively easier to implement. We basically need to add more Jinja stuff to the templates and more code at the vertical extractor level to get things done.
Would possibly solve the layout issues for specific instances (at the vertical level) and would also keep the scraper generic enough to handle other instances

Cons -

Increased requests to the server
Would adapt the layout from the source but the other things like the nav etc. would still be generic.

Option 3: Have templates for specific instances The last option that I can think of is having multiple templates for different instances, say there's a template for PHZH, another for edX etc. This would also require us to have a new parameter which would accept which template set to use. However, for classes on the div with the vert class, we would still need those calls to LMS URL at the vertical level.This also comes with its own pros and cons -

Pros -

Less change than the other two methods to the python code
The most accurate aesthetic match would be possible as we would have control over CSS and JS

Cons -

Increased number of requests than current version, but less than the second option
Having multiple templates means maintaining multiple templates, and that would basically increase the repository size
Functions like the sidenav would have to be modified to make them optional to achieve the best aesthetic feel

I suggest we go with the second option as it basically allows us to have the proper layout in the course content, in spite of not matching the source exactly (in terms of other components like the nav and sidenav). Also, we would be able to continue with the current codebase with no drastic changes. To me it seems to be a balance between the aesthetics and the usability of the scraper on different instances (though there might be an instance that has things implemented differently than I have so far noticed). The third option is also doable in my opinion.

@rgaudin @kelson42 @dattaz @Popolechien what are your views?

openzim / openedx

Moving forward with custom CSS/JS in instances #90