linkwarden / linkwarden

⚡️⚡️⚡️Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages, articles, and more...
https://linkwarden.app
GNU Affero General Public License v3.0
7.59k stars 292 forks source link

Self-hosted archives #329

Open jmuchovej opened 9 months ago

jmuchovej commented 9 months ago

Is your feature request related to a problem? Please describe. Not really.

Describe the solution you'd like Self-hosted archives – e.g., made locally and viewable through Linkwarden.

Describe alternatives you've considered The only alternative is the current approach which leaves much to be desired for self-hosted folks.

Additional context Currently Linkwarden uses archive.org for snapshots. While archive.org is incredible, if Linkwarden becomes quite popular, this could cause undue stress on the Archive Project by using it for archives instead of using its own servers for this. However, I'm not necessarily proposing this for hosted Linkwarden, more so for self-hosted Linkwarden (e.g., deployed by TrueCharts or similar).

Additionally, this might permit the use of Linkwarden for aspects of the deep web – where credentials could be stored on a local instance of Linkwarden so that Linkwarden may correctly access said pages (which could be really cool, imo).

daniel31x13 commented 9 months ago

Hello @jmuchovej,

If I’m understanding correctly, Linkwarden already does capture a screenshot, PDF and a readable view from each webpage.

The archive.org snapshot is an opt-in option which can be enabled in the profile settings.

jmuchovej commented 9 months ago

Right – Linkwarden already provides these (which are useful), however I've encountered issues with archive.org not rendering styles/images correctly (sadly, I don't have any links available off-hand, though I can try to dig a few up if necessary).

While the screenshot, PDF, and reader views are certainly useful – they lack interactivity which some sites of interest may require. A few straightforward examples of this are pages where they draw a computational graph using the <canvas> element or if there's a React/Vue/etc. component which allows for interacting with a page to learn a particular concept, every reader view I've ever used will omit these (I can also try this on Linkwarden if helpful).

(Ultimately, I understand the motivation to use archive.org, especially at this stage of Linkwarden's development – it reduces engineering/maintenance burden on the team – but I think it would make for a solid addition. Additionally, it might allow Linkwarden to support future features like highlighting, summarizing, etc. where otherwise it would need to be driven by the extension injecting code into archive.org sites.)

daniel31x13 commented 9 months ago

Ohh that's a singlefile format: #192

Just added it as a todo... 👍

jmuchovej commented 9 months ago

Ohhh, I see. Didn't see that issue in my search. 😅

Feel free to close this, if helpful for keeping issues clear. 🙂

pe1uca commented 4 months ago

Probably the original request is a valid use case.
I have a personal https://github.com/ArchiveBox/ArchiveBox instance which I'd like to use instead of archive.org and the integrated Linkwarden archive.
(Even then archivebox already has an archive.org integration)

Maybe a way to easily configure this part would be enough to allow a custom archival site or post-processing someone might want to do to a saved URL.
https://github.com/linkwarden/linkwarden/blob/4640c1c966d37b7fc22e4ebfcb244d03da1d6d82/lib/api/sendToWayback.ts#L14