jehugaleahsa / FlatFiles

Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
The Unlicense
357 stars 64 forks source link

Allow DelimitedTypes to skip multiple columns #85

Closed benjaminsampica closed 2 years ago

benjaminsampica commented 2 years ago

Is your feature request related to a problem? Please describe.

My team has deep integrations with Oracle. Our primary interface with Oracle is through flat files (which is a whole different yuck) but if you've ever worked with Oracle they have hundreds fields, of which you need maybe 10. The CSV's we generate are really simple and low-complexity, but there are just so many columns we need to ignore. We chose to move to this library over CsvHelper for many reasons, one of which was the simplified ignoring semantics on both delimited and fixed-length. It's great that we can ignore windows in fixed-length files but would love if we could have similar functionality in the delimited API surface.

Describe the solution you'd like A Ignored(int columnCount) or similar feature to fixed-length files.

Describe alternatives you've considered We're currently just iterating with a loop to add a Ignore() property a zillion times where needed.

Other If you'd like to punt that back onto us with suggestions we'd be happy to submit a PR.

jehugaleahsa commented 2 years ago

I've been thinking about this for awhile so I figured I'd give you some feedback so you don't think I'm ignoring you.

Perhaps funny, I've been thinking less about how do you indicate ignored ranges and more about how to avoid iterating over IgnoredColumn columns that do nothing. Like, there's a cost to looping over hundreds of columns unnecessarily when a file contains hundreds of lines of code (or worse).

benjaminsampica commented 2 years ago

No worries! Thanks for the feedback. I’m by no means a CSV/parser expert so I apologize in advance if this is a hot take.

So at least for our use case, we aren’t reading Oracle files and so lack the complicated/convoluted type mapping that is often associated with reading external CSVs that are garbage. We wanted to keep the formatting of our data separate from the thing creating them so that if we switched CSV libraries the lift would be easier. Thus, the format of each column is happening before we write the CSV and are not using things like “.OutputFormat()”).

At the end of the day, if Ignored(int column) internally simply writes a number of commas (or the specified delimiter) to optimize for high count values is fine with me. Maybe it already does this, and I’m talking out of my ass and completely misunderstood where your mind was at.

jehugaleahsa commented 2 years ago

I'm closing this for now. I might comment some time in the future if I decide to circle back on this.