openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Try to handle iframes recursively #128

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

106 was due to iframes and getting iframes can be very tricky as they can contain any random type of content.

However, since the iframes seem to be a part of the course, this allows the scraper to try and get the iframes recursively. (Though it cannot get all content always).

For example, in one of the iframes, everything is in a script tag and hence it's difficult to parse that and get every asset downloaded.

This does the following changes -

Also, unrelated -

It's important to note that though this makes the scraper try to get iframes, it does not guarantee that iframes will always be 100% properly scraped. I mean we would need a generic scraper for that.