I spent some time wondering why a Markdown file with YAML front-matter had stopped parsing correctly, until I realised that po4a (or one of the text handling libraries) is interpreting the BOM as a regular character in the same line as the YAML document separator (---), and so it believes the whole front-matter block is a Markdown paragraph.
This source snippet (BOM char shown as <feff>
<feff>---
title: Quick security recommendations for your devices
post_date: 2023.10.17
author: Security in a Box
published: true
teaser_image: ../../../media/en/blog/polygonal-hand-holding-smartphone.png
teaser: "See whare are few first effective steps one can take to better protect their Android, iOS/iPhone, Windows, Mac and Linux devices"
---
Is extracted as:
#. type: Plain text
#: src/blog/quick-security-recommendations-for-your-devices/index.md:8
msgid ""
"<feff>--- title: Quick security recommendations for your devices post_date: "
"2023.10.17 author: Security in a Box published: true teaser_image: ../../../"
"media/en/blog/polygonal-hand-holding-smartphone.png teaser: \"See whare are "
"few first effective steps one can take to better protect their Android, iOS/"
"iPhone, Windows, Mac and Linux devices\""
msgstr ""
If I remove the BOM, then parsing is fixed:
#. type: Yaml Front Matter Hash Value: author
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "Security in a Box"
msgstr ""
#. type: Yaml Front Matter Hash Value: teaser
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "See whare are few first effective steps one can take to better protect their Android, iOS/iPhone, Windows, Mac and Linux devices"
msgstr ""
#. type: Yaml Front Matter Hash Value: teaser_image
#: src/blog/quick-security-recommendations-for-your-devices/index.md:1
#, no-wrap
msgid "../../../media/en/blog/polygonal-hand-holding-smartphone.png"
msgstr ""
I spent some time wondering why a Markdown file with YAML front-matter had stopped parsing correctly, until I realised that po4a (or one of the text handling libraries) is interpreting the BOM as a regular character in the same line as the YAML document separator (
---
), and so it believes the whole front-matter block is a Markdown paragraph.This source snippet (BOM char shown as
<feff>
Is extracted as:
If I remove the BOM, then parsing is fixed: