Closed Iristyle closed 8 years ago
@DavidS here you go... note that I think longer-term we need a better fix, but this should work for now.
/cc @LarissaLane
@Iristyle should we go ahead and un-skip the other bom tests if they are passing anyway? While we're in here?
Also can you give us any info on the sense of urgency for getting a release out with these changes?
@cprice404 I was wrong about those tests - they are still failing. However, I question the usefulness of 1 of the skipped tests that are remaining, so if you want to yank it in a follow-up commit:
it "acceptBOMStartOfStringConfig" do
skip("BOM not parsing properly yet; not fixing this now because it most likely only affects windows") do
# BOM at start of file is just whitespace, so ignored
conf = Hocon::ConfigFactory.parse_string("\uFEFFfoo=bar")
expect(conf.get_string("foo")).to eq("bar")
end
end
I think it's safe to assume that BOM artifacts of files have been handled by the time an encoded string has been passed to parse_string
, so I think this one can be deleted.
it "acceptBOMWhitespace" do
skip("BOM not parsing properly yet; not fixing this now because it most likely only affects windows") do
# BOM here should be treated like other whitespace (ignored, since no quotes)
conf = Hocon::ConfigFactory.parse_string("foo= \uFEFFbar\uFEFF")
expect(conf.get_string("foo")).to eq("bar")
end
end
I thought this one was perhaps unnecessary as well... but... Ruby's JSON.parse
in 2.1.9 deals with that just fine:
[1] pry(main)> json = '{ "foo": "\uFEFFbar\uFEFF" }'
=> "{ \"foo\": \"\\uFEFFbar\\uFEFF\" }"
[2] pry(main)> require 'json'
=> true
[3] pry(main)> JSON.parse(json)
=> {"foo"=>"bar"}
So probably still want to leave that one in, even though I'd think it's fairly edge case.
@Iristyle ok, thanks.
Is this blocking things? Should we prioritize getting a new release out urgently?
I don't believe this is blocking anything - it's more of a nuisance issue. But @DavidS / @hunner can probably chime in with more specifics since they're working on the Azure module. I hopped in just because I knew right away how to handle the problem. Also, to be fair, aside from YAML files, I don't believe we handle BOMs that well in Puppet itself either.
I added https://tickets.puppetlabs.com/browse/PUP-6447 a bit ago to increase our tolerance on Windows for BOMs so that Windows users don't have to jump through hoops to remove them -- but the discovery and actual work still need to be done.
I created HC-83 to capture the work of review/merge/release. We will bring that into our next sprint unless it is needed more urgently.
I added some additional UTF-8 / UTF-16 specs - will probably file some tickets to cover those missing pieces (though I don't think implementation work is urgent)
:see_no_evil:
I'm 👍 asides from the comments on the expected failure test
Thanks for doing this, I had no idea windows was affected by issues like this. Even if it doesn't totally fix everything, it seems like a significant improvement
All updated @cprice404 / @jpinsonault
👍 looks good
This has gone out in hocon 1.1.2
Previously Hocon::ConfigFactory.parse_file did not specify an encoding, and didn't allow for files with UTF-8 BOMs on Windows.
In reality, HOCON config files should be detected by their BOM and treated as UTF-8, UTF-16LE, UTF-16BE, UTF-32LE or UTF-32BE based on the presence of the BOM, with a fallback to UTF-8 when one is not present, based on RFC 4627 at https://www.ietf.org/rfc/rfc4627.txt
This fix is a bit naive as it may improperly load HOCON config files on Windows which are UCS-2 (a precursor to UTF-16LE). Its recommended that this be addressed later in a better File parsing scheme that peeks at the first few bytes of the file to determine the encoding correctly.