Move scraping of site title and site description to the client (bookmarklet and browser extension)

sissbruecker / linkding

Self-hosted bookmark manager that is designed be to be minimal, fast, and easy to set up using Docker.

https://linkding.link/

MIT License

6.81k stars 325 forks source link

Move scraping of site title and site description to the client (bookmarklet and browser extension) #118

Open eyriewow opened 3 years ago

eyriewow commented 3 years ago

Currently, when adding a bookmark through the firefox browser addon the addon will populate certain fields automatically. I could be wrong here but it seems like the actual scraping of that data, when the bookmark gets added to linkding, is handled by the server.

Most of the time this is not an issue, but with certain sites like those protected by cloudflare, this leads to unexpected behavior as illustrated here;

Adding a bookmark for a Path of Exile forum post via the browser extension: Notice the pre-populated title field, as expected YoloMouse_PDT1GLgAb7

How that bookmark then appears in linkding: YoloMouse_3RlxhE769i

sissbruecker commented 3 years ago

For now I would say that this is by design. The scraping happens on the server because:

it can be reused by the internal bookmark form, by the extension, as well as other tools using the REST API
fetching a website using AJAX methods from the browser would likely lead to cross origin issues. While the extension might be able to circumvent CORS checks, the internal bookmark form would definitely not

While it's unfortunate that some sites block request coming from servers, I would prefer to keep things simple and keep the logic in one place rather than implement this logic multiple times in different places / languages.

An alternative I can think of is to extend the extension to:

provide a setting to always set an explicit title + description and get these from the current tab
ATM the extension only reads the tab title, so it would also need to be extended determine a description from the document

sissbruecker commented 2 years ago

Changed the title to include the bookmarklet into this issue. See https://github.com/sissbruecker/linkding/issues/292 for the original request. As mentioned there, if the website metadata is provided by a client, then scraping on the server could be skipped.

I'm more open to this now, as there are bug reports around this from time to time. Ideally the client should provide both the website title and description. Getting the title is straightforward, however the description is not. There are websites (GitHub, Reddit) that do not update the website's meta description tag while navigating through the page, which means the description provided by the client might not be correct. Kind of hard to say which method (client or server scraping) would provide better results on average.

For now I assume server-side scraping is still be better alternative, if someone has ideas around the description issue, feel free to share.

joshdick commented 1 year ago

Regarding the description issue, I would love if any currently-selected text on a page would be used as the description when invoking the bookmarklet/extension (the current behavior would be kept if no text is selected.)

Barring that, it would be nice to at least have the ability to manually provide a description parameter to /new in order to homebrew the functionality described above by customizing the bookmarklet on my own, using it in Apple Shortcuts, etc.

sissbruecker commented 1 month ago

The browser extension now allows to use the title and description of the current browser tab instead of fetching those through the server. The bookmarklet still needs to be updated.

ccxuy commented 1 month ago

Thanks for your great work! I stored over 50,000 bookmarks with this project on my little server now.