swiftcsv / SwiftCSV

CSV parser for Swift
MIT License
947 stars 190 forks source link

Adds a test to show error described in #126 (also fixes #126) #128

Closed Diggory closed 1 year ago

Diggory commented 1 year ago

Adds a test to check if creating a CSV instance directly from a string that starts with a BOM correctly creates the headers (without the BOM in the first header.). This is referencing issue #126

Diggory commented 1 year ago

Now also fixes #126

Diggory commented 1 year ago

@DivineDominion Hi, any chance of a quick review? Thanks.

DivineDominion commented 1 year ago

Ah, I see -- the now 'designated' string-based initializer does all the heavy lifting instead of the many URL based ones.

That sounds sensible to me; I'll approve and merge into master. Thank you!

In the meantime:

@lardieri Do you recall any good reason we did not go ahead with this, or did we just not notice when you came up with the first BOM fix?

lardieri commented 1 year ago

It's not that we didn't notice. It's that we decided it was irrelevant. BOMs do not apply to Strings, they apply to encodings of Strings.

A String in Swift is a sequence of abstract Characters that does not have an inherent encoding. (The internal storage of a String is none of our business.) Therefore, a String doesn't have and doesn't need to have any indicator of byte order.

The BOM is a hint to the decoder that turns a sequence of bytes into a String. It is not part of the String itself.

If someone is passing you a String that has something that looks like a byte order mark in the first Character, I would argue that it is a malformed String and the decoder that created it is buggy.

Unfortunately, as I mentioned in this comment, Apple's UTF-8 decoder is buggy.

Given all of the above, we decided we would deal with BOMs at the point where we decode a sequence of bytes into a String, namely, in the convenience initializers.

It's not that I strongly object to moving the BOM handler into the designated initializer. I just feel that it is papering over a bug elsewhere, namely in the code that reads the byte contents of the Excel CSV file and turns it into a String.

DivineDominion commented 12 months ago

Thank you for the reminder and summary!

I just feel that it is papering over a bug elsewhere, namely in the code that reads the byte contents of the Excel CSV file and turns it into a String.

It's true that this is none of SwiftCVS's responsibility and we could do without this completely.

But 'Postel's Law' for robust software ("Be conservative in what you do, be liberal in what you accept from others") isn't a bad idea. Apple bugs usually don't go away quickly, so we can make the journey simpler for users of the library. That should be good for developer happiness.

If nobody is objecting:

@Diggory would you be willing to add a comment to the BOM that summarizes why we're doing that (b/c Apple's code is buggy) and point to https://github.com/swiftcsv/SwiftCSV/issues/97#issuecomment-1181599651?

Diggory commented 12 months ago

Will do.