sissbruecker / linkding

Self-hosted bookmark manager that is designed be to be minimal, fast, and easy to set up using Docker.
https://linkding.link/
MIT License
6.35k stars 299 forks source link

HTML entities: "'" instead of apostrophe in article title #647

Closed yue-dongchen closed 6 months ago

yue-dongchen commented 6 months ago

I believe this encoding for special character is called "HTML entities". Sample link: https://theins.press/en/confession/270004

Expected parsing: “We were handed envelopes with 'incentives' for delivering favorable results”: A Russian election official's confession Actual result: “We were handed envelopes with \'incentives\' for delivering favorable results”: A Russian election official's confession

Should this be fixed?

sissbruecker commented 6 months ago

Looks like a bug, it should unescape HTML entities when parsing the website title and description.

sissbruecker commented 6 months ago

After taking a closer look, linkding unescapes entities already. It converts ' into '. It would then have to unescape ' again to get to '. I'm not sure if that is expected. Browsers seem to handle this somehow, but I can also find other tools (1, 2) that don't support this.

I'd say that's an issue with the website, as there is no need to escape & and there should be no need to unescape a text twice. I'd rather not change the logic in linkding, as it could affect other cases. I'll close the issue for now, let's see if something similar comes up in the future.

yue-dongchen commented 6 months ago

Thanks for investigating this. I agree. I haven't encoutered this with any other website's link previews.