j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Add ability to process prefetched content #274

Closed Kdecherf closed 2 years ago

Kdecherf commented 2 years ago

fetchContent() now accepts an optional parameter, prefetchedContent, which can contain the content of a page that was fetched outside of Graby.

If we take the example of Wallabag it gives the ability to send the content of a page (through a browser extension for example) without making network calls to fetch the actual page.

coveralls commented 2 years ago

Coverage Status

Coverage increased (+0.03%) to 95.102% when pulling 31135a72fdb3b1a560474e17016845dd9fbc4e87 on Kdecherf:feature/offline-mode into fda67248a45f9e40eddd835d8e9aa42ef0bdc665 on j0k3r:master.

jtojnar commented 2 years ago

Could you please add some context – what is the purpose of this? If it is for tests, would not it be cleaner to mock httpClient?

Kdecherf commented 2 years ago

@jtojnar I worked a few weeks ago on a WebScrapBook server implementation in wallabag, it lets me save actual page content from my browser using the extension WebScrapBook which is useful for pages with javascript rendering or bot protection.

This implies to send content to Graby without it to make a HTTP call, thus adding support for "prefetched content".

Kdecherf commented 2 years ago

This PR is ready for review I think, poke @jtojnar @j0k3r

Kdecherf commented 2 years ago

Looks good to me, could you add an entry in the readme about that new behavior?

@j0k3r https://github.com/j0k3r/graby/pull/274/commits/7aa0592609ddfaabaa69f5efb5788b09f154fbbc do you think it's enough?