picocms / Pico

Pico is a stupidly simple, blazing fast, flat file CMS.
http://picocms.org/
MIT License
3.82k stars 616 forks source link

UTF-8 with BOM #461

Closed fl3pp closed 4 years ago

fl3pp commented 5 years ago

Hello together,

Unfortunately Pico didn't render my YAML content correctly. I was searching several hours until I found the issue in the regex:

$markdown = preg_replace($metaHeaderPattern, '', $rawContent, 1);

I was able to reproduce the problem in a small script:

$metaHeaderPattern = "/^(\/(\*)|---)[[:blank:]]*(?:\r)?\n"
. "(?:(.*?)(?:\r)?\n)?(?(2)\*\/|---)[[:blank:]]*(?:(?:\r)?\n|$)/s";

// returns false
$rawContent = file_get_contents("C:\FILE WITH BOM");
$isMatch = preg_match($metaHeaderPattern, $rawContent) == 1;

// return true
$rawContent = file_get_contents("C:\FILE WITHOUT BOM");
$isMatch = preg_match($metaHeaderPattern, $rawContent) == 1;

It seems that the regex replacement (and the markdown parsing too) is not able to handle files with UTF-8 BOM headers correctly. One should not add BOMs anyway, but many text-editor do without users knowing it.

Could you add support for BOMs?

PhrozenByte commented 5 years ago

Not sure about whether we should add explicit support for UTF-8 BOM or not. UTF-8 BOM are known to cause problems, so I'm not sure whether we should "promote" them by explicitly supporting them...

@all, feedback is appreciated!

fl3pp commented 5 years ago

I understand what you mean and agree with you that this concern is worth thinking about before implementing.

But in my opinion a corresponding error message should be shown, explicitely dissallowing BOMs because they are not supported. Or at least mention this topic in the documentation.

Otherwise users (like me) are being confronted with a really confusing behavior.

fl3pp commented 5 years ago

I fixed it for myself using a small plugin

https://gist.github.com/jflepp/fb8e46369ab88907020fbd143fa2b8b5

This is for sure not the best solution, but it solves my problem for now