j0k3r / graby

Graby helps you extract article content from web pages
MIT License
362 stars 73 forks source link

Additional fingerprints #338

Closed HolgerAusB closed 10 months ago

HolgerAusB commented 11 months ago
fivefilters commented 11 months ago

There's also this for Substack that we use, if you'd like to add it, @HolgerAusB:

'<link rel="stylesheet" type="text/css" href="https://substackcdn.com/' -> fingerprint.substack.com

j0k3r commented 10 months ago

Good job and thanks for your first PR @HolgerAusB! I squashed your commits and fixed tests

HolgerAusB commented 10 months ago

OK @j0k3r, I don't understand, for what ContentExtractorTest.php is. No need to explain, I am not a real coder!

But you use here a wider range of the fingerprint and I just want be sure that this will not cause problems. You wrote: ... {"de.ippen-digital.story.onlineId":91197383} ... but that number should be different for each article, I think.

And there is not test entry for the other two services I added: fingerprint.medium.com and fingerprint.substack.com

j0k3r commented 10 months ago

The test doesn't care about the id because the fingerprint doesn't check it. So it's ok. Yeah, no big deal if there is no test for other fingerpring :)

HolgerAusB commented 1 month ago

@j0k3r, any ETA about when this is might going to be released and pushed to wallabag?