[Feature Request] Local Scraper (Use browser auth)

hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search

https://hoarder.app

GNU Affero General Public License v3.0

6.52k stars 237 forks source link

[Feature Request] Local Scraper (Use browser auth) #172

Open brucealdridge opened 6 months ago

brucealdridge commented 6 months ago

There are a number of sites that I visit that I would like to bookmark that require authentication. A good example of this is news sites with content behind paywalls.

Using a scraper on a server won't work and will instead just save a login screen.

Can the browser extensions pass a copy of the page via the API and save that?

kamtschatka commented 6 months ago

Yes, taking screenshots is possible with chrome extensions. One issue would be that rescraping the page would not be possible as you would go to the login screen again, so there would need to be some kind of "prevent rescraping" flag. Another option would probably be to use your locally installed chrome instance for scraping the data by running a worker locally. I am not sure how user friendly that would be^^.

MohamedBassem commented 6 months ago

This makes a lot of sense. The extension itself can capture the page content so that hoarder doesn't need to crawl it. This is a reasonable feature request, will add it to our todo list :)

kureta commented 6 months ago

This would be a great feature. Also tubearchivist has a browser extension that syncs your youtube cookies with the tube archivist server. An extension that automatically shares all your cookies, or let's you choose which cookies to share, or sends the cookies of current page to hoarder before it starts scraping might be an option.

javydekoning commented 1 month ago

A similar solution to Evernote Web clipper would be awesome.

Select some text/images -> right click -> hoard.

https://chromewebstore.google.com/detail/evernote-web-clipper/pioclpoplcdbaefihamjohnefbikjilc?hl=en

huyz commented 1 month ago

NotChristianGarcia commented 2 weeks ago

^ web-clipper does exist and work. I didn't like the flow too much.

SingleFile is another project to check. It outputs .html (or an archive). I think it's easier to manage and quick to run. It has an Upload to a REST form API option in settings that sets a destination that hoarder could use if hoarder doesn't want to re-implement scraping.

errorsandwarnings commented 3 days ago

@NotChristianGarcia I agree, and came looking for similar functionality. All it needs is a REST API endpoint which single file can send to and the API endpoint parses the archive and puts it in hoarder. This will help solve so many problems. Local scrape is a really needed feature.

yinan-c commented 3 days ago

That's what I have in mind as well.

MohamedBassem commented 3 days ago

Hey folks, I know how important this feature is and it's on top of my todo list! I'll see if I can get it in the next release!