stefanw / bibbot

BibBot is a browser extension that removes the paywall on German online news sites using your library account's access to press databases.
https://stefanw.github.io/bibbot/
GNU General Public License v3.0
498 stars 78 forks source link

spiegel.de: falsche Artikel Ersetzung #447

Open BreiteSeite opened 3 months ago

BreiteSeite commented 3 months ago

Artikel https://www.spiegel.de/ausland/terrorgefahr-in-deutschland-experte-warnt-vor-anschlaegen-der-afghanischen-terrorgruppe-a-f95ccaaa-167d-4e2f-a731-b50d43fcae79

wird ersetzt durch: https://bib-voebb.genios.de/document/SPPL__d2ec496a062e0ec7bec461fbc0661b2a3a20a5f9

sollte aber ersetzt werden durch: https://bib-voebb.genios.de/document/SPPL__94bc036b05f5a3523098c8a417e1a5fa80bd8abd

Die eigentlichen Artikel hab ich gefunden indem ich "Bei dem Terroranschlag in Moskau starben 137 Menschen" bei genios gesucht habe (falls das hilft).

eengnr commented 3 months ago

Putting the '.leading-loose' first in https://github.com/stefanw/bibbot/blob/e29c14b1a01501ad99bb9d63de861ee60b126fca/src/sites.ts#L100 could fix this. But in this special case it's also necessary to slice here from 2 to 10 instead of 15: https://github.com/stefanw/bibbot/blob/e29c14b1a01501ad99bb9d63de861ee60b126fca/src/sites.ts#L9 Otherwise the correct article is not found, because one word is different.

Perhaps it's worth a try, I also had issues with articles on SpOn which were not mapped correctly. Preferring the '.leading-loose' could lead to better results.

If necessary I could provide a PR.

Paul0k commented 1 month ago

Could you please provide a PR or do you got another hint? I'm trying to fix it. I found the right line of code. But all my "fixes" aren't working.

stefanw commented 1 month ago

The selector query func utils now allow for custom slice ranges if that helps. https://github.com/stefanw/bibbot/commit/86ff5cad23c06087a8244568d2ba89beb0300cce