mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
121 stars 58 forks source link

Broken parsing of files with Unicode BOM #443

Closed NightTsarina closed 6 months ago

NightTsarina commented 8 months ago

I spent some time wondering why a Markdown file with YAML front-matter had stopped parsing correctly, until I realised that po4a (or one of the text handling libraries) is interpreting the BOM as a regular character in the same line as the YAML document separator (---), and so it believes the whole front-matter block is a Markdown paragraph.

This source snippet (BOM char shown as <feff>

<feff>---
title: Quick security recommendations for your devices
post_date: 2023.10.17
author: Security in a Box
published: true
teaser_image: ../../../media/en/blog/polygonal-hand-holding-smartphone.png
teaser: "See whare are few first effective steps one can take to better protect their Android, iOS/iPhone, Windows, Mac and Linux devices"
---

Is extracted as:

#. type: Plain text
#: src/blog/quick-security-recommendations-for-your-devices/index.md:8
msgid ""
"<feff>--- title: Quick security recommendations for your devices post_date: "
"2023.10.17 author: Security in a Box published: true teaser_image: ../../../"
"media/en/blog/polygonal-hand-holding-smartphone.png teaser: \"See whare are "
"few first effective steps one can take to better protect their Android, iOS/"
"iPhone, Windows, Mac and Linux devices\""
msgstr ""

If I remove the BOM, then parsing is fixed:

#. type: Yaml Front Matter Hash Value: author
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "Security in a Box"
msgstr ""

#. type: Yaml Front Matter Hash Value: teaser
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "See whare are few first effective steps one can take to better protect their Android, iOS/iPhone, Windows, Mac and Linux devices"
msgstr ""

#. type: Yaml Front Matter Hash Value: teaser_image
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "../../../media/en/blog/polygonal-hand-holding-smartphone.png"
msgstr ""
mquinson commented 6 months ago

Thanks for reporting. It should now be fixed.