Additional fingerprints

j0k3r / graby

Graby helps you extract article content from web pages

MIT License

362 stars 73 forks source link

Additional fingerprints #338

Closed HolgerAusB closed 10 months ago

HolgerAusB commented 11 months ago

new fingerprint for ippen.media based newspapers (German), the old one is not longer valid
new fingerprint for medium.com based websites, transformed from @fivefilters FulltextRSS config

fivefilters commented 11 months ago

There's also this for Substack that we use, if you'd like to add it, @HolgerAusB:

'<link rel="stylesheet" type="text/css" href="https://substackcdn.com/' -> fingerprint.substack.com

j0k3r commented 10 months ago

Good job and thanks for your first PR @HolgerAusB! I squashed your commits and fixed tests

HolgerAusB commented 10 months ago

OK @j0k3r, I don't understand, for what ContentExtractorTest.php is. No need to explain, I am not a real coder!

But you use here a wider range of the fingerprint and I just want be sure that this will not cause problems. You wrote: ... {"de.ippen-digital.story.onlineId":91197383} ... but that number should be different for each article, I think.

And there is not test entry for the other two services I added: fingerprint.medium.com and fingerprint.substack.com

j0k3r commented 10 months ago

The test doesn't care about the id because the fingerprint doesn't check it. So it's ok. Yeah, no big deal if there is no test for other fingerpring :)

HolgerAusB commented 1 month ago

@j0k3r, any ETA about when this is might going to be released and pushed to wallabag?