Closed misternichols closed 6 years ago
Opposite of Ignored
, indeed. I do see the value in the ability to produce additional, computed values in the result set. Originally I was thinking you could create your own derived ColumnDefinition<T>
class, but that serves to parse/format custom types, aka, it's associated to a column in your file.
Are you using type mappers or just readers? With classes, you could always implement unmapped properties to perform calculations.
At the reader level, I'd ideally parse all the columns in a row and then provide you the opportunity to generate additional column values based on what was parsed. Such columns would only be useful when reading files, completely meaningless when writing files. Such computed values could be passed along to the type mapper layer and treated like any other column, oblivious to their source.
The index I passed to such a column could be the physical row in the file or the row ignoring skipped rows. Something else to think about...
My scenario requires the schema to be completely configurable so classes are not the right tool for the job. I don't know the file format or output requirements at compile time. My consumer objects also expect an IDataReader
as input. These are the reasons why I headed down the readers route.
Agreed with writing files, the column values are derived after all.
I did look into implementing this. There are a few roadblocks, and I thought it would best to reach out for insight. That and I'm cautious of over-complicating such an intuitive library with features that don't fit. One of the main road-blocks is here: https://github.com/jehugaleahsa/FlatFiles/blob/071743d3bfc2b28777fc47c6d962460225b9f5b1/src/FlatFiles/SeparatedValueReader.cs#L161-L164 The readers would need to be modified to be aware that not all columns are sourced from the text file. As far as I can tell there is also presently a restriction that the output column ordinals exactly match the input column ordinals which is perfectly fine without these new requirements.
I think the separate concerns of input column schemas and output column schemas is the root of the problem.
The resulting FlatFiles IDataReader
would still need to expose the output column schema which contains all the columns including the custom columns in the correct order, etc.
The more I think about it, implementing an IDataReader
decorator might actually make sense. The decorator would need to handle mapping the input columns from the decorated FlatFiles IDataReader
to the appropriate output columns as well as calling the evaluation function on each of the custom computed columns on demand.
I am going to consolidate this issue with #29. I should be providing a solution to map metadata in and out shortly.
Sorry for the long delay, btw. I think I just needed time to digest what all needed to be done.
Firstly, what a great library, well done!
Context
I'm presently using FlatFiles in the reading side of an ETL solution and consuming the data by
IDataReader
. I'm manually defining the columns in theSchema
.There are times when I need to add additional contextual information as extra columns into the output rows for later processing. (Examples: program job number, parent folder name of the text file, record number.) This information is not available within the text file data itself but is known to FlatFiles or the program orchestrating the processing. Since I'm using
IDataReader
there is no intermediary object or class that I can define additional context or properties on.As far as I can tell there isn't a clean way to add additional custom static or calculated columns and include them in the schema?
Possible Implementation
In my scenario it would be great to be able to define and add extra custom columns to the
Schema
that require an evaluation function (a bit likeIColumnDefinition.Preprocessor
.) The evaluation function would allow the program to calculate and/or inject contextual data as a column in the output rows during text file reading. In a way this is the opposite ofIgnoredColumn
.My scenario's don't require this but for additional utility, the custom column values could be evaluated after all other normal column values are parsed with the results being passed through to the evaluation functions. Possible evaluation function definition:
Func<object[], int, object>
where the string array is the parsed data (with nulls for custom columns) and the int is the current record count. The return could be an object that doesn't require further parsing and should be of typeIColumnDefinition.ColumnType
. If this is too complicatedFunc<string, int, object>
would suffice, where string is the raw record string.Would this best to be a new
IColumnDefinition
class implementation similar toIgnoredColumn
or something else. How should the writing side of things be handled? Most likely the best thing to do would be completely exclude them from the output.Possible Workaround
In theory I could decorate the FlatFile's
IDataReader
with anotherIDataReader
that injects the extra custom columns as necessary. This quickly gets messy though. Any cleaner ideas?I hope I've made sense. In your eyes is this something that would add value to the library?