Closed devnoname120 closed 1 year ago
You could try combining -a accesskey:secret -d capture_outlinks=1
with -o https?://mywebsite\.com
(will require archive.org S3 keys). The script's own options aren't capable of only traversing a single level of outlinks, but the API's options are only capable of traversing a single level of outlinks, so in theory combining them would allow for what you're describing.
Note that since proper support hasn't been implemented in the script for -d capture_outlinks=1
, you won't be able to get any data about the external outlinks, and you will probably end up capturing the internal outlinks more times than necessary.
It would be convenient to have an option to only save the first level links of external outlinks, but no level restriction for internal outlinks.
The
internal
/external
separation could be set with a new option that expects a RegExp (just like-o
and-x
).I can't figure an easy way to do that so far.
The only solution I have in mind (but didn't try) is:
-x
to exclude all external outlinks.external-outlinks.txt
.external-outlinks.txt
as a list of links to save.