jf-tech / omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
MIT License
971 stars 70 forks source link

Debug mode #113

Closed DGollings closed 3 years ago

DGollings commented 3 years ago

Hi,

First off, thanks for this parser. Recently found out I needed to parse some EDI and this helped out, well, eventually. Being new to omniparser and EDI made the learning curve pretty much vertical.

What didn't help was the nontrivial nonstandard message I needed to parse (it comes with a 102 page manual) Only after giving up and moving to a different library, giving up again and moving to a javascript library, days of trial and error and finally getting a vague grasp of EDI did I realize what I was doing wrong**. Came back to omniparser and managed to create a schema that could handle both test files I have.

Anyway, what helped tremendously was adding these lines to the output:

diff --git a/extensions/omniv21/fileformat/edi/seg.go b/extensions/omniv21/fileformat/edi/seg.go
index cefc213..99a3833 100644
--- a/extensions/omniv21/fileformat/edi/seg.go
+++ b/extensions/omniv21/fileformat/edi/seg.go
@@ -1,6 +1,8 @@
 package edi

 import (
+       "fmt"
+
        "github.com/jf-tech/go-corelib/maths"
 )

@@ -95,8 +97,19 @@ func (d *segDecl) matchSegName(segName string) bool {
                //    "...loop is optional, but if any segment in the loop is used, the first segment
                //    within the loop becomes mandatory..."
                //  - https://github.com/smooks/smooks-edi-cartridge/blob/54f97e89156114e13e1acd3b3c46fe9a4234918c/edi-sax/src/main/java/org/smooks/edi/edisax/model/internal/SegmentGroup.java#L68
+               if len(d.Children) > 0 {
+                       children := make([]string, len(d.Children))
+                       for i, c := range d.Children {
+                               children[i] = c.Name
+                       }
+                       fmt.Printf("group "+d.Name+" children %v\n", children)
+               }
                return len(d.Children) > 0 && d.Children[0].matchSegName(segName)
        default:
+               fmt.Printf("node %s found: %v\n", d.fqdn, d.Name == segName)
+               if d.Name != segName {
+                       fmt.Printf("unexpected node %s \n", segName)
+               }
                return d.Name == segName
        }
 }

It helped figuring out the last known 'good state', what the parser saw, where I was, etc. I don't expect you to add that exact code as its pretty ugly/messy. But I'd like to suggest adding some kind of verbose mode

**I'm not sure, but I don't think any of the parsers out there handle EDI segment compression well. Was trying to strictly implement the specification I had, but had to loosen it up a bit.

jf-tech commented 3 years ago

@DGollings yes, EDI was a nightmare for us to and took us the longest time to come around. For the points you made:

  1. Verbose mode: completely agree we should have a -v mode so that schema writing and debugging would be easier. For EDI, I guess you want to see the "trace" of how the parser sees and analyzes an EDI file that is every segment match attempts (matchSegName), every segment movement (like segDone, and segNext) with loop info? I'll think about it.

  2. Do you have an example of EDI seg compression?

  3. If I may, it would help tremendously if users of the client can contribute samples of schemas and inputs (after scrubbing sensitive info away if needed), that way 1) we can learn more use cases and more importantly 2) we can have an ever growing regression suit to ensure compatibility and consistency.

DGollings commented 3 years ago

1 Sample output:

node UNA/SG0/SG15/SG31/SG37/PCI found: true
node UNA/SG0/SG15/SG31/SG37/PCI found: true
node UNA/SG0/SG15/SG31/SG37/PCI found: true
node UNA/SG0/SG15/SG31/SG37/PCI found: false
unexpected node UNT 
group SG41 children [SGP]
node UNA/SG0/SG15/SG31/SG41/SGP found: false
unexpected node UNT 
group SG44 children [DGX FTX FTX FTX FTX FTX FTX FTX FTX FTX SG46]
node UNA/SG0/SG15/SG31/SG44/DGX found: false
unexpected node UNT 
group SG49 children [EQD EQN]
node UNA/SG0/SG15/SG31/SG49/EQD found: false
unexpected node UNT 
group SG31 children [GID FTX SG34 SG34 SG35 SG37 SG41 SG44 SG49]
node UNA/SG0/SG15/SG31/GID found: false
unexpected node UNT 
{"dest":{"city":"CZ","country":"ZEBRA A/S","state":"46388520"},"weight_uom":"LBS"}
4
group SG15 children [CNI DTM DTM DTM DTM TSR MOA MOA FTX SG17 SG18 SG19 SG20 SG22 SG25-SENDER SG25-RECEIVER SG25-REST SG31]
node UNA/SG0/SG15/CNI found: false
unexpected node UNT 
group SG0 children [UNH BGM FTX FTX CNT CNT CNT SG1 SG5 SG13 SG15]
node UNA/SG0/UNH found: false
unexpected node UNT 
node UNA/UNT found: true
PASS

Where segDone and segNext are implicit (true -> false)

  1. Created new issue

  2. I'd have to ask client, but I assume it should be possible

jf-tech commented 3 years ago

I've created a new project (https://github.com/jf-tech/omniparser/projects/6) to track this feature addition. And closing this issue for now.