jehugaleahsa / FlatFiles

Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
The Unlicense
357 stars 64 forks source link

New option to pass the desired Encoding to SeparatedValueParser and FixedLengthParser #1

Closed Valem closed 10 years ago

Valem commented 10 years ago

Hi

I'm using your great library and I discovered some issue.

I did the following in this branch:

Tests were failing if culture was different than yours => All tests are now culture independent

The StreamReader in FixedLengthParser and SeparatedValueParser used default behavior of guessing how the incoming text is encoded, but the guess almost everytime is UTF8, this doesn't work out very well here in Europe as for example we get flat files encoded with Code Page 1252. => I added to both Methods and their respective overloads the optional parameter Encoding, so that you can choose with which Encoding you want to read the text. I added also some tests

May I ask you to take a look into this and if you agree merge it to the master ?

Many Thanks!

jehugaleahsa commented 10 years ago

To be honest, this is absolutely awesome! I've never had anyone contribute so much to one of my projects. It's not so much the code change but the unit tests.

Personally, I would like the encoding to be part of the ParserOptions classes. It would default to null, and then in the code you could just say new StreamReader(stream, options.Encoding ?? Encoding.Default). I am concerned that if I support automatic detection (another one of StreamReader's overloads) I will have to add yet another parameter to the constructors.

Let me know what you think.

Valem commented 10 years ago

That's the great thing about open source and github makes it really easy to collaborate, beside that, I think personally that unit tests are very important :)

Yeah, at first I was also thinking to put the Encoding into the ParserOptions, but I had the impression that it kind of didn't fit in the options. But I still like it to be in the options (even more now), because it looks cleaner and if someone is looking for setting the Encoding, he would look into the options after not finding it in the overloaded methods, so it's intuitive. Yes and the automatic detection is just another argument to put it into the options :)

Appreciated very much your fast answer, thanks!

Let me know how to continue on this (I'm also available to refactor this), it would be great to have an official update on nuget.org

jehugaleahsa commented 10 years ago

I have merged these changes in and redeployed the NuGet package. I made the changes as we discussed. Thanks so much for your participation!

I did notice, however, that commenting out the Encoding had no affect on the results of the unit tests. I'd recommend taking a look at them to see why. Perhaps the default encoding in the US is different than in Europe.