Closed sebikeller closed 1 year ago
To keep the change working with older code workbook.rb
could be changed too:
# Returns an array of sheets for the case of same name sheets.
# @param [String] name
# @return [Array<Xsv::Sheet>]
def sheets_by_name(name)
+ name = name.dup.force_encoding(Encoding::UTF_8) if name.is_a?(String) and name.encoding == Encoding::ASCII_8BIT
@sheets.select { |s| s.name == name }
end
Maybe also just that place could be changed to convert to Encoding::BINARY
if the string is not yet encoded in binary to fix this function call to work with unicode strings. But i find, that it is more natural that the sheet names are UTF-8 strings, not binary strings.
Leaking binary strings here was definitely an oversight. Thanks for your contribution and detailed pull request! I will release a new gem version with this change soon.
Glad to hear that 😄 Did you already decide on how to handle the possible breaking change?
Well, the previous behaviour (that required a workaround to access sheets with UTF-8 characters by name) was definitely incorrect and existing tests (that tested with 7-bit characters only) weren't broken so I think the breakage is acceptable here.
I'd rather not carry the suggested backward-compatible change in Xsv forever as this piece of the code is already pretty messy.
Thank you for your answer. I can follow and understand your reasoning.
I only commented about the code for a compatibility solution and did not include it in the PR directly, since the current way to access sheets with non ascii names seems incorrect to me too.
Calling Workbook::sheets_by_name with a name containing an UTF-8 string, will now return the correct sheet.
Ruby stores string with UTF-8 encoding per default. (Unless specified otherwise by using
# encoding: <ENC>
orString.new
)The
io.sysread(...)
method call in thesax_parser.rb
used to gather the data from the xml within the zip file returns a binary string.This is no problem for comparison as long as the bytes of the strings match. But introducing non ASCII-Characters, the string comparison is not matching anymore.
Assuming a sheet is named
Geschäftsabschluss
, then the following code would return an empty array without this change:Forcing the sheet names to UTF-8 is a good enough fix from my perspective, since the XML is already stored in UTF-8.
It is how ever a breaking change. As the code above would return different results after merge: