Closed MajdMustapha closed 4 years ago
Hey,
The pywebcopy tries to recreate the exact folder structure as of the target site. So to modify the slightest path fragment could cause error-prone scrambling of files. Thus not recommended.
If you are desperate then you should subclass the URLTransformer
class found in the urls.py
module. Then many errors later you would just dump the idea.
Thank you for the prompt response, I've seen it done here and I have no clue how they did it,any thoughts?
I was hoping pywebcopy can help, but I understand that this can be hard.
The library you linked is completely different than pywebcopy. So I would suggest refactoring of your code to account for the folder structure.
I am closing this issue as this is not fully related to pywebcopy.
Will this be possible in pywebcopy7, or will you still not support it? I see that pywebcopy7 has introduced a new 'tree_type'
config with one of the values being LINEAR
- is this option related to this issue at all? I did try saving a webpage with this option in pywebcopy7, but it didn't produce any different results than HIERARCHY
.
For the record, I'd also very much like to see this, as I only care about one-off downloads of a single webpage and nothing more. I'm willing to play around a bit and see if I can implement it myself, if you can point me in the right direction.
Hey @tybug The tree_type variable is indeed an attempt in this direction. But as of now there are only two modes 'LINEAR' & 'HIERARCHY' which have no diference in case of single webpage but will show effect when used in crawls.
I will try to modify the behavior of LINEAR to match it like this or I can introduce a third option.
Hello, Good job you did on such task, however, I was wondering if it's possible to save the HTML page under "page.html" and have all the assets under one folder with the same name "page_files" for e.g. where the latter folder can have js , css and photos etc .. For a given URL : https://www.example.com/page.html can I have this as output? -- example.com | --- page_files ( folder containing all assets js,css ... can be many folders as well) | --- page.html
Thank you in advance