org-tigris-jsapar / jsapar

JSaPar is a Java library providing a schema based parser and composer of almost all sorts of delimited (CSV) and fixed width files.
Apache License 2.0
16 stars 5 forks source link

Nested Lines as Master/Details #3

Open Woodham opened 6 years ago

Woodham commented 6 years ago

I have a use case where I need to parse and produce fixed-width files that contain nested/child records. Is jsapar able to handle this case? I couldn't find an example that seemed to match.

e.g.:

Header Line
- Line A
    - Child Line A1
    - Child Line A2
- Line B
    - Child Line B1
    - Child Line B2
    - Child Line B2
...
Footer Line

Where Line A/B.. have the same cells and the Child lines have the same cells.

stenix71 commented 6 years ago

Hi! It is not possible to get jsapar library to handle this out of the box directly at the moment. You need to cache latest master master object in order to add the child object to it once the child object is completely built. A bit like working with a Sax-parser, you need to keep track of the previous master.

I would say that you have two options:

  1. Use Text2BeanConverter and handle each bean event. You need to implement the BeanEventListener in such a way that you keep track of the last master object and then add each child object to it. Then you treat the master object as complete once either there is a start of a new master object or you reach the end of the input.
  2. Use the TextParser, create your own implementation of the LineEventListener and then cache latest master master line object in order to add the child line object to it when parsing each child line. A bit more complicated since you need to fetch each cell value explicitly. The LineUtils class contains a lot of helper methods to remove some of the complexity.

Hope this helps.

anuragdeshpande commented 5 years ago

I have a similar requirement for parsing as well. any recommendations in case the master line cell structure is different from the detail line cell structure?

stenix71 commented 5 years ago

Could you provide some example or a more detailed description of your specific problem?

anuragdeshpande commented 5 years ago

My usecase: Fixed width File with Master detail lines with different cell structure each (underscore represent end of a cell) A file can have repetitive instances of the line structure mentioned below

Template: 1_0000012_abcdefg_12345abcdefg 2_11111112233_adfajsdfjlkadshf_112233445566 3_asdfasdfasdfasdf_34234234123_asdfasdfadsf

File Structure: 1_0000012_abcdefg_12345abcdefg 2_11111112233_adfajsdfjlkadshf_112233445566 3_asdfasdfasdfasdf_34234234123_asdfasdfadsf 1_0000012_abcdefg_12345abcdefg 2_11111112233_adfajsdfjlkadshf_112233445566 3_asdfasdfasdfasdf_34234234123_asdfasdfadsf 1_0000012_abcdefg_12345abcdefg 2_11111112233_adfajsdfjlkadshf_112233445566 3_asdfasdfasdfasdf_34234234123_asdfasdfadsf

Hope that helps

stenix71 commented 5 years ago

Is it correct that the first position contains either the value "1", "2" or "3" depending on which type of row it is? In that case there is actually nothing that differs your scenario from the original question. You only have to provide a line condition for each line type within the schema. See the chapter about Line condition in the article Basics of Schema.