jehugaleahsa / FlatFiles

Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
The Unlicense
357 stars 64 forks source link

Is it possible to configure a fixed length window to use a variable length? #90

Open David-Jacobsen opened 2 years ago

David-Jacobsen commented 2 years ago

I know the question sounds stupid, but I'm facing a challenge where some record formats are of a variable length.

Essentially, I have a file with multiple record formats, most of which are fixed length with a space between each field, however there is also a comments record type. The comments record type is identified by starting with a COM, and then the comment is the remainder of the line. So..

COM This is the comment COM This is also a comment

Those lines are not space padded up to a defined length.

I've used the FixedLengthTypeMapperSelector to be able to map the various fixed length fields to their own classes, but I'm struggling to figure out a configuration to accommodate the comment record types. In reading the code, it looks like this might be possible as is with the Trailing window, but I'm not sure how to configure it.

jehugaleahsa commented 2 years ago

Take a look at this section in the README: https://github.com/jehugaleahsa/FlatFiles#skipping-records Basically, you add an event handler to the reader that lets you look at and skip a row before it gets processed. Let me know if you run into any issues.

David-Jacobsen commented 2 years ago

Thank you, that's what I was doing initially but I was informed I need to load the comment records as well. It's a bit of a mess and I don't have control over the source format, nor the destination format.

Sample file:

COM This class has had multiple drop outs COM This class was impacted by poor accessibility to computer lab STU 897654 098 095 STU 876532 070 074 END

I have to load each record as the following entity

Class ClassData
{
 public string RecordType {get;set;}
 public string StudentId {get;set;}
 public string MidTerm {get;set;}
 public string Final {get;set;}
 public string Comment {get;set;}
}

So, I have a List() and I'm using FlatFiles to parse each record of each file, and then add it to the list.

var STUMapper = FixedLengthTypEmapper.Define<ClassData>();
STUMapper.Property(s => s.RecordType, new Window(4));
STUMapper.Property(s => s.StudentId, new Window(6));
STUMapper.Property(s => s.MidTerm, new Window(4));
STUMapper.Property(s => s.Final, new Window(3));

var ENDMapper = FixedLengthTypeMapper.Define<ClassData>();
ENDMapper.Property(e =>e.RecordType, new Window(3));

var COMMapper = FixedLengthTypeMapper.Define<ClassData>();
COMMapper.Property(c => c.RecordType, new Window(4));
COMMapper.Property(c => c.Comment, new Window(???));

var selector = new FixedLengthTypeMapperSelector();
selector.When(x => x.StartsWith("STU")).Use(STUMapper);
selector.When(x => x.StartsWith("COM")).Use(COMMapper);
selector.When(x => x.StartsWith("END")).Use(ENDMapper);

var records = new List<ClassData>();
using var reader = new StreamReader(file);
var fixedReader = selector.GetReader(reader);
while (await fixedReader.ReadAsync().ConfigureAwait(false))
{
 records.Add((ClassData)fixedReader.Current);
}

If I set the window for the comment arbitrarily small, then it works but obviously it truncates the comments. If I skip the comment records, it also works but it skips the comment records. If I set it as a delimited file, then the space would break up every record... unless a delimitedtypemapper has the ability to 'combine' all trailing fields into a single field?

~Edit: I actually do have a very janky work around but it will likely come at the cost of performance. Essentially, I scan the file twice. Once using the FixedLengthMapperSelector which will parse all of the fixed length records, and then as a DelimitedTypeMapper with "COM " as the delimiter, and hardcoding the RecordType to "COM"...

var comOptions = new DelimitedOptions();
comOptions.Separator = "COM ";
var comMapper = DelimitedTypeMapper.Define<ClassData>();

comMapper.Property(p => p.RecordType).OnParsed((o,t) => o.RecordContext.Values[0] = "COM");
comMapper.Property(p => p.Comment);

The FixedLengthMapperSelector will skip all records starting with "COM" and the DelimitedTypeMapper will skip all records with Values.Length < 2

Again, it's really janky but it works unless you have a better idea.

David-Jacobsen commented 2 years ago

Not sure what's preferred, editing the original or adding another comment, but I believe I have found a solution with Window.Trailing.

var COMMapper = FixedLengthTypeMapper.Define<ClassData>();
COMMapper.Property(c => c.RecordType, new Window(4));
COMMapper.Property(c => c.Comment, Window.Trailing);

Initial tests have this working. Thanks for such an excellent and configurable product!