Closed yue-dongchen closed 6 months ago
Looks like a bug, it should unescape HTML entities when parsing the website title and description.
After taking a closer look, linkding unescapes entities already. It converts '
into '
. It would then have to unescape '
again to get to '
. I'm not sure if that is expected. Browsers seem to handle this somehow, but I can also find other tools (1, 2) that don't support this.
I'd say that's an issue with the website, as there is no need to escape &
and there should be no need to unescape a text twice. I'd rather not change the logic in linkding, as it could affect other cases. I'll close the issue for now, let's see if something similar comes up in the future.
Thanks for investigating this. I agree. I haven't encoutered this with any other website's link previews.
I believe this encoding for special character is called "HTML entities". Sample link: https://theins.press/en/confession/270004
Expected parsing: “We were handed envelopes with 'incentives' for delivering favorable results”: A Russian election official's confession Actual result: “We were handed envelopes with \'incentives\' for delivering favorable results”: A Russian election official's confession
Should this be fixed?