Means0305 commented 9 months ago

Hello,

Thank for providing Smooks. It is amazing. Recently, I tried mooks-edi-cartridge 2.0.0-RC3. I want to convert my edi message to xml, but I found the performance is not good.

This is my edi text. edi.txt

This is my xsd. I change the file name to txt for uploading edifact-to-xml-desadv-mapping.dfdl.xsd.txt

My source:

protected static String runSmooksTransform() throws IOException, SAXException { // Instantiate Smooks with the config... Smooks smooks = new Smooks(new DefaultApplicationContextBuilder().setClassLoader(EdiToXml.class.getClassLoader()).build()); smooks.addConfigurations("smooks-config-edifact-to-xml-desadv.xml"); try { // Create an exec context - no profiles.... //ExecutionContext executionContext = smooks.createExecutionContext();

         StringResult result = new StringResult();

         // Configure the execution context to generate a report...
         //executionContext.getContentDeliveryRuntime().addExecutionEventListener(new HtmlReportGenerator("target/report/report.html"));

         // Filter the input message to the outputWriter, using the execution context...
         //smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)), result);
         smooks.filterSource(new StreamSource(new ByteArrayInputStream(messageIn)), result);

         return result.getResult();
     } finally {
         smooks.close();
     }
}

I tried to run it, it ran 1 hour, but still didn't finish.

cjmamo commented 9 months ago

Can you include the Smooks config?

Means0305 commented 9 months ago

@claudemamo Thank you very much. I attached my config file. smooks-config-edifact-to-xml-desadv.xml.txt I changed xml to txt file also.

Means0305 commented 9 months ago

I also tried to remove the listener of ExecutionContext, the result is same. Still very slow

cjmamo commented 9 months ago

It's stuck during compilation of edifact-to-xml-desadv-mapping.dfdl.xsd. Having so many points of uncertainty (i.e., minOccurs="0") in your schema is extremely expensive in terms of compilation time. You need to reduce these points of uncertainty to improve performance. Any reason why you're not using edifact:parser to ingest the file? The reader comes bundled with DFDL schemas generated directly from the EDIFACT directories.

Means0305 commented 9 months ago

@claudemamo

Thank you very much. I will try it. The reason why I don't use edifact:parser is that I need create the EDI file not only for EDIFACT but also X12. For the maintenance point view, I hoped I could maintain for EDIFACT and X12 by same way. Is there any X12 DFDL in smooks?

Means0305 commented 9 months ago

@claudemamo

I made the test again. If I just put the EDI message like below. ==============Message In============== UNB+G:H+K:J:I+M:L:N+O:P+D+Q:R+B+E+A+C+F' UNH+1+DESADV:D:97A:X+CC+B:A+B:D:C:A+B:C:100:B+B:D:C:A' BGM+D:A:B:C+E:G:F+H+I' DTM+A:2024/01/30:yyyy/MM/dd' DTM+A:2024/01/30:yyyy/MM/dd' ALI+A++B+C+D+E+F' ALI+G++H+I+J+K+L' MEA+X+D:C:B:A+2:1:1:100:100+ABC' MEA+AD+D:C:B:A+2:1:1:100:100+CCC' MOA+B:1000:USD:A:C' MOA+B:1000:USD:A:C'

The message could be converted in several seconds. But when I add one more segment like below.

==============Message In============== UNB+G:H+K:J:I+M:L:N+O:P+D+Q:R+B+E+A+C+F' UNH+1+DESADV:D:97A:X+CC+B:A+B:D:C:A+B:C:100:B+B:D:C:A' BGM+D:A:B:C+E:G:F+H+I' DTM+A:2024/01/30:yyyy/MM/dd' DTM+A:2024/01/30:yyyy/MM/dd' ALI+A++B+C+D+E+F' ALI+G++H+I+J+K+L' MEA+X+D:C:B:A+2:1:1:100:100+ABC' MEA+AD+D:C:B:A+2:1:1:100:100+CCC' MOA+B:1000:USD:A:C' MOA+B:1000:USD:A:C' RFF+F:BBB:1:V'

suddenly it became very slow. I think it is because Smooks could not locate RFF soon. Because according to EDIFACT definition, there is several location for RFF.

cjmamo commented 9 months ago

Is there any X12 DFDL in smooks?

Unfortunately no because the X12 implementation guides are proprietary. Even if work was sponsored for its support, I don't think we could open-source it.

The message could be converted in several seconds.

It took ~10 mins for the DFDL schema to compile on my local machine so it's way too long in my opinion.

cjmamo commented 9 months ago

For the maintenance point view, I hoped I could maintain for EDIFACT and X12 by same way.

A lot of work was put into generating the DFDL schemas for EDIFACT so I suggest you stick to edifact:parser.

Means0305 commented 9 months ago

@claudemamo

It took ~10 mins for the DFDL schema to compile on my local machine so it's way too long in my opinion. A little different from my PC. You can see by below message, it just took 10 seconds. ==============Message In============== UNB+G:H+K:J:I+M:L:N+O:P+D+Q:R+B+E+A+C+F' UNH+1+DESADV:D:97A:X+CC+B:A+B:D:C:A+B:C:100:B+B:D:C:A' BGM+D:A:B:C+E:G:F+H+I' DTM+A:2024/01/30:yyyy/MM/dd' DTM+A:2024/01/30:yyyy/MM/dd' ALI+A++B+C+D+E+F' ALI+G++H+I+J+K+L' MEA+X+D:C:B:A+2:1:1:100:100+ABC' MEA+AD+D:C:B:A+2:1:1:100:100+CCC' MOA+B:1000:USD:A:C' MOA+B:1000:USD:A:C'

======================================

start date time:Thu Feb 15 19:26:49 2024 INFO 2024-02-15 19:26:55,755 [main] org.smooks.cartridges.dfdl.DataProcessorFactory: Compiling and caching DFDL schema... ==============Message Out============= <?xml version="1.0" encoding="UTF-8"?>

====================================== end date time:Thu Feb 15 19:26:59 2024

but if I added RFF+F:BBB:1:V' it will take a lot time. May I know how you compile it. I thought I could put it to spring boot as an injection. So the complication time will be no problems.

cjmamo commented 9 months ago

Yeah you're right. It's taking a long time to parse the second document but compilation is indeed relatively fast. I had a missing newline at the end of the first document which led to the parser attempting various parsing paths. However, the problem about these uncertainty points still stands. If edifact:parser is not an option, you could try to re-author the DFDL schema so it's simpler and less structured. So for example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format"
                schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>

            <dfdl:defineFormat name="EDISegmentSequenceFormat">
                <dfdl:format separator="{$ibmEdiFmt:FieldSep}" separatorPosition="postfix"
                             separatorPolicy="suppressedAtEndStrict" terminator=""/>
            </dfdl:defineFormat>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element name="message">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="segment" maxOccurs="unbounded">
                    <xsd:complexType>
                        <xsd:sequence dfdl:ref="EDISegmentSequenceFormat">
                            <xsd:element name="element" type="xsd:string" maxOccurs="unbounded"/>
                        </xsd:sequence>
                    </xsd:complexType>
                </xsd:element>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Otherwise, as a I said, edit the schema to have less optional elements.

Means0305 commented 9 months ago

@claudemamo

Thank you for your explanation. Noted and I removed all "minOccurs="0"" in the XSD. And ran it again. Unfortunately, the below message was shown a lot.

However, according to EDIFACT, some segment is optional. I think I could not define all segment as mandatory. I will think some other way.

And edi:unparser runs very well. Result and speed is perfect. Thanks a lot.

smooks / smooks-edi-cartridge

The edi:parser's performance is not good #290