openzim / openedx

Open edX (to zim) scraper
GNU General Public License v3.0
8 stars 7 forks source link

Invalid external links #106

Closed rgaudin closed 4 years ago

rgaudin commented 4 years ago

zimcheck reports a few invalid external links with latest phzh ZIM

[ERROR] Invalid external links found:
  https://phzh.h5p.com/content/1290479281859517267/embed is an external dependence in article A/course/core-english-01/topic-5-my-feelings-and-myself/unit-1-dealing-with-different-emotions/step-1/index.html
  https://phzh.h5p.com/content/1290541649250884247/embed is an external dependence in article A/course/core-english-01/topic-6-i-live-in-a-healthy-way/unit-1-delicious-food-healthy-food/step-2/index.html
  https://phzh.h5p.com/content/1290573667101944937/embed is an external dependence in article A/course/core-english-01/topic-9-my-interests-and-where-i-want-to-go/unit-2-the-professional-fields/step-2/index.html
  https://phzh.h5p.com/content/1290517274635840267/embed is an external dependence in article A/course/core-english-01/topic-3-this-is-what-i-can-do/unit-2-the-stairway-to-success/step-1/index.html
satyamtg commented 4 years ago

These are some iframes and at the moment we only handle iframes containing videos and PDFs. Handling these would mean handling iframes recursively in the HTML. However, they are a part of the course and that's why I think that we shall handle iframes recursively. I mean getting the whole HTML in the iframe and calling the HTML processor on the whole HTML of the page.