openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
18 stars 16 forks source link

Links without target not written properly #53

Closed satyamtg closed 3 years ago

satyamtg commented 3 years ago

Links with empty targets are not rewritten properly as the namespace is only removed and they link to an invalid page. An example of sotoki zimcheck is as follows -

  A/question/question/535.html (../../A/question/question/535.html) was not found in article A/question/535.html
  A/question/question/631.html (../../A/question/question/631.html) was not found in article A/question/631.html
  A/question/question/739.html (../../A/question/question/739.html) was not found in article A/question/739.html
  A/question/question/810.html (../../A/question/question/810.html) was not found in article A/question/810.html
  A/question/question/973.html (../../A/question/question/973.html) was not found in article A/question/973.html

These are links in --nopic mode where src attribute of img tags are made ""