librariesio / bibliothecary

:notebook_with_decorative_cover: Libraries.io Package Manager Manifest Parsers
https://libraries.io/rubygems/bibliothecary
GNU Affero General Public License v3.0
89 stars 36 forks source link

Handle non-utf8 files when removing BOM too, and DRY up into a helper method. #565

Closed tiegz closed 1 year ago

tiegz commented 1 year ago

Followup to https://github.com/librariesio/bibliothecary/pull/564

We saw an error when an ASCII-8BIT yarn.lock file was run against bibliothecary after adding the "remove BOM" logic:

Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

We need to force the string's encoding to be utf8 before regex-ing it. ASCII-8BIT is fine to force, but if we need to support less common encodings like utf-32be in the future, we'll probably need to call encode() on the string too so it's properly encoded.