jf-tech / omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
MIT License
931 stars 68 forks source link

[EDI] Schema Output Configuration | Unable to access ancestor segments via xpath #217

Closed manuel-neuhauser-hs closed 1 year ago

manuel-neuhauser-hs commented 1 year ago

For an EDI file with a nested segment set to is_target: true, I'm unable to access higher level segments beyond direct ancestors.

For example, using xpath to navigate up the hierarchy to the functional_groups segment and then down to the GE segment returns no values yields no result.

xpath: ../../GE/GE01
       ^  ^
       GS |
          functional_groups

Simplified Segment Declaration

ISA ✅
  functional_groups (type: segment_group)
    GS ✅
      transactions (type: segment_group, is_target: true)
        ST ✅
        [...]
        SE ✅
    GE ❌
IEA ❌

Example FINAL_OUTPUT Configuration

    "FINAL_OUTPUT": {
      "object": {
        "st_header": {
          "xpath": "ST/ST01"
        },
        "se_footer": {
          "xpath": "SE/SE01"
        },
        "gs_header": {
          "xpath": "../../GS/GS01"
        },
        "ge_footer": {
          "xpath": "../../GE/GE01"
        },
        "isa_header": {
          "xpath": "../../../../ISA/ISA01"
        },
        "iea_footer": {
          "xpath": "../../../../IEA/IEA01"
        }
      }

Example Output

Note that ge_footer and iea_footer are not present.

[
  {
    "gs_header": "BE",
    "isa_header": "00",
    "se_footer": "159",
    "st_header": "834"
  },
  {
    "gs_header": "BE",
    "isa_header": "00",
    "se_footer": "159",
    "st_header": "834"
  },
  {
    "gs_header": "BE",
    "isa_header": "00",
    "se_footer": "159",
    "st_header": "834"
  },
  {
    "gs_header": "BE",
    "isa_header": "00",
    "se_footer": "159",
    "st_header": "834"
  }
]

Command

op transform --schema 834-schema.json.txt --input edi834.txt

Files

edi834.txt 834-schema.json.txt

Is there an issue with my schema file, or is accessing these values not supported? Thanks in advance.

jf-tech commented 1 year ago

@manuel-neuhauser-hs good question.

Yes, you're not able to because omniparser is designed as a streaming parser. As it reads in and builds the hierarchical tree (in IDR), when it encounters and finishes ingesting and transforming a segment marked with is_target: true, it will return to the caller's transform.Read() call. In your case, the later segments GE IEA haven't even been read it yet. There is no magic about it. And currently we don't have any way of out of sequence read, or something like 2-pass reading.

One potential solution, depending on your input EDI file size, is to:

Let me know if you have additional questions.

(P.S. a bit more in depth, as omniparser "streams" parsing in the EDI file, builds IDR and performs transforms on is_target: true segment/segment_group, once it's done returning to caller, and upon the next transform.Read() call, it will delete/remove the current is_target segment from the IDR tree. So it will have all the parent segments of is_target seg in memory/IDR, but it will never have more than 1 is_target: true segment in IDR/memory; and it will never, as I explained above, have any segments beyond the current reading cursor).

manuel-neuhauser-hs commented 1 year ago

Thanks for the insight. I'll try the proposed solution.