rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

URL changed when i set url property of the WebPage's get method. #89

Closed zengyinggang closed 1 year ago

zengyinggang commented 2 years ago

Hi, my url contain two dots, this program ignores those dots and made a wrong asset url. For example: My url is https://example.com/rims/pro/0aff25a9b7d8705d99d558e82a19f8f8/sec/HQv2cFXZHwS6kcNVuquD6etOFRPDO7kvo_XJ6lzbxFMYxHUNy3xND7XT9Hlpqvl04hcf77j9NqhV7bF5cF129THtfGkM4rvQBOUKqT027uIuN4A7M8rvNHupBhay1QNyenlkLVk3kipkNnS1urCAHg../sed/tipps/html/0d-1568269-master.html?docuNo=7cdeaaa7e7d69c174aca6a55b1221310

You can see that before "/sed" there are two dots in it. After i crawl the website, some of the asset url changed, for example https://example.com/rims/pro/0aff25a9b7d8705d99d558e82a19f8f8/sec/HQv2cFXZHwS6kcNVuquD6etOFRPDO7kvo_XJ6lzbxFMYxHUNy3xND7XT9Hlpqvl04hcf77j9NqhV7bF5cF129THtfGkM4rvQBOUKqT027uIuN4A7M8rvNHupBhay1QNyenlkLVk3kipkNnS1urCAHgsed/tipps/assets/scss/hst2-param.css

The css link href is "../assets/scss/hst2-param.css"

rajatomar788 commented 2 years ago

It isnt ignoring the 'dots', this is a basic security measure to prevent unauthorized access to the user files by the program. If this protection is removed then the downloaded files could be saved or read or deleted from unexpected directories. @zengyinggang