jsvine / waybackpack

Download the entire Wayback Machine archive for a given URL.
MIT License
2.86k stars 192 forks source link

Saving url query string in filename #29

Open ghost opened 7 years ago

ghost commented 7 years ago

Currently if you try to save a resource with the template of:

www.site.com/news?385

It will save it as merely "news" instead of something like news@385 (like what wget does).

I looked through the code and couldn't find the part that is handling the url query, but if one is saving a large amounts of files in that format, it becomes less userfriendly to simply have a thousand files labeled "news".

Awesome program by the way.

jsvine commented 7 years ago

Thanks! And that's a great point. I'm leaning toward shifting waybackpack's away from filesystem storage and toward database storage. (Relevant comment here.) With the latter, there would be no need to convert URLs into filepaths.

Re. the code, the relevant bit is this line, which uses Python's built-in os.path.split and urlparse methods to extract the filename: https://github.com/jsvine/waybackpack/blob/master/waybackpack/pack.py#L42