martijn / xsv

High performance, lightweight .xlsx parser for Ruby that provides nothing a CSV parser wouldn't
https://storck.io/posts/announcing-xsv-1-0-0/
MIT License
194 stars 20 forks source link

Encoding::UndefinedConversionError when parsing non-ASCII character #22

Closed vmsp closed 3 years ago

vmsp commented 3 years ago

To reproduce, just create a new XLSX with a single cell with content :

Encoding::UndefinedConversionError ("\xC3" from ASCII-8BIT to UTF-8)

Also, after having input the date 05/05/1995, xsv gives me 1995-05-05. Is this expected? And is there a way to override this behavior?

I'm using version 1.0.0.pre

Thanks!

martijn commented 3 years ago

Seems I forgot all about string encoding in the new parser. I will try to fix that today.

As for the dates, Xsv should always return a Date object for cells with a date. The Excel formatting is lost in the translation, because of the 'Excel separated values' philosophy of Xsv.

martijn commented 3 years ago

I made a slightly brute-force update on the master branch. Can you test if this resolves the encoding issue for you?

https://github.com/martijn/xsv/commit/5fe78a8c10e574f631b634b89df302f180a7c120

vmsp commented 3 years ago

I haven't yet found a gem that allows me to avoid casting to appropriate type and just return everything as written. Maybe I could tempt you to consider this functionality?

Anyway, the issue is indeed fixed. Thank you!

martijn commented 3 years ago

Thanks for your feedback!

Excel stores dates as an integer (days since epoch), so the raw information wouldn't be useful to most users. Returning the formatted date, time or number as it appears in Excel would involve parsing and applying the Excel number formats. It is possible, but I currently don't have the time for it.