imthaghost / goclone

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
https://goclone.io
MIT License
1.38k stars 294 forks source link

Keep assets (js, css, img) original directory #30

Closed yang-wei closed 10 months ago

yang-wei commented 2 years ago

Hey, awesome project.

Before this changes, every JavaScript goes to /js, every image goes to /img. Not only does this change the original directory (unexpected behaviour?), it makes the code complicated (We need to replace the link).

This commit keeps the structure, what do you think

Also added webp support.

Let me know if you are not interested in this at all.

imthaghost commented 2 years ago

Hey @yang-wei,

Thanks for the submission! This was one of my first Golang projects so things are probably overly complicated where they do not need to be.

Few questions for clarification. What do you mean by "keeps the structure"? In this PR it appears you are grabbing all of the web assets and dumping them into the root directory. I think the initial reasoning I had behind creating a directory for each web asset ( /js /imgs ect) is the fact web pages aren't flat and are tree-like(similar to your computer file structure). Say if we need to clone multiple pages within a single URL having the structure of the site flat-like (dumping all the files in the root directory) that can eventually get really messy and hard to navigate. Are there any other benefits besides having potential unexpected behavior for making the structure like so?

yang-wei commented 2 years ago

@imthaghost good work on your first Go project though. Indeed they were simpler and very easy to understand. That said if you don't like this PR please feel free to close it (=. I forked your project by myself

yeah my intention was to clone the website as how it was uploaded. This has a few benefit

imthaghost commented 2 years ago

@yang-wei After cloning your fork I totally see what you are trying to accomplish. Thanks for the submission!

This makes the project much cleaner since there is no need for putting files in new directories and doing rearrangement (which tries to preserve the original file structure) from the target site. However, I did encounter a couple of issues. The changes don't handle the case when the asset does not have a local path on the target site. For example:

<script src="https://code.jquery.com/jquery-3.4.1.min.js"></script>

There is not a relative path in this asset string so no directory gets created for this asset and the file gets stored into the root directory of the project. And the link inside of the HTML would remain as https://code.jquery.com/jquery-3.4.1.min.js.

├── css
├── images
├── index.html
├── jquery-3.4.1.min.js
├── js

But keeping a link as such would require us to request the resource from the internet. We want to be able to browse our site locally even if we don't have an internet connection, so we would still have to rearrange the relative links inside the HTML.

Do you have suggestions on how you would solve that particular case?

Also, I love the PR no need to close it. Def want to work on solving this special case and potentially merging your submission in!

yang-wei commented 2 years ago

Thank you!

Good one. This is what I didn't intend to do but rewriting will be the only way as far as I could think of (=

imthaghost commented 2 years ago

Thank you!

Good one. This is what I didn't intend to do but rewriting will be the only way as far as I could think of (=

Agreed. I was thinking about doing a pretty big refactor of the project since there isn't even the simplest of error handling.. I think I could continue the work off of this branch since I like the structure you proposed. :)