rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

Inconsistency between image filename saved to disk and link in source file #117

Open coljac opened 1 year ago

coljac commented 1 year ago

When spidering a website I have found that pywebcopy saves the images like this:

domain.com/dir/image_1.jpg.jpeg

But the source contains

<img src="./image_1.jpg">

In other words, it's appending a .jpeg extension where it oughtn't.

rajatomar788 commented 1 year ago

Hey, This shouldn't be happening. But you can take a look at urls.py if you want to figure out the name generation specially the url2path function.

Also this could arise due to threading. So you should try running the job without threading. Hope this helps.