librariesio / bibliothecary

:notebook_with_decorative_cover: Libraries.io Package Manager Manifest Parsers
https://libraries.io/rubygems/bibliothecary
GNU Affero General Public License v3.0
89 stars 36 forks source link

Remove Byte Order Marks from beginning of file contents before parsing. #564

Closed tiegz closed 1 year ago

tiegz commented 1 year ago

Fixes parsing of files like this cyclonedx.json, which have a Byte Order Mark at the beginning of the file:

File.read("cyclonedx.json").unpack("U*")[0,10]
=> [65279, 123, 13, 10, 32, 32, 32, 32, 34, 98]

JSON.parse(File.read("cyclonedx.json"))
JSON::ParserError: Empty input (after ) at line 1, column 1 [parse.c:1116] in '{
    "bomFormat": "CycloneDX",
    "specVersion": "1.4",
    "version": 1,
    "metadata": {
        "tools": [
            {
                "vendor": "CycloneDX",
                "name": "Node.js module",

Note: normally we'd be able to remove it while reading the file with File.read(path, encoding: 'bom|utf-8'), but most of bibliothecary's inputs are pre-read file contents, so we have to work on the strings themselves.

tiegz commented 1 year ago

so could we say... we are removing the BOM so we can parse this BOM?

💯