Closed ianlancetaylor closed 3 years ago
Thanks for letting me know, @ianlancetaylor! I'll think about how best to address this soon.
Typically this library reports this sort of thing in terms of grapheme clusters (using the Unicode text segmentation algorithm) and so arguably both the old and new behavior were a bit inconsistent with other parts of the library, but it does feel a little weird to use grapheme clusters here where we're talking about CSV input of uncertain character encoding (albeit with the strings subsequently forced interpreted as UTF-8 anyway) so my initial instinct is to treat the new byte-based behavior as the "correct answer" and then to accommodate the pre-1.17 behavior as a pseudo-bug.
I'll ponder this a little more before taking action, but will aim to get something in place before or shortly after Go 1.17 beta, which I imagine is coming sometime this month per my understanding of the Go schedule.
FYI there is a beta release of 1.17 available now for easy testing: https://groups.google.com/g/golang-announce/c/i4EliPDV9Ok.
I've pushed a fix for this in 47a79612085e043d528fab49f8687ec6bd6a77e5. Thanks!
On Go tip, the future Go 1.17, we've clarified the
ParseError.Column
field to be a 1-based byte index, to match the newReader.FieldPos
method. Previously theColumn
field was a somewhat inconsistent rune index. This has caused a test failure:In order to keep working with the future Go 1.17, this test should accept column 8 as well as column 7, or just ignore the column field.
Let me know if I can help. Thanks.