Closed Means0305 closed 9 months ago
Can you include the Smooks config?
@claudemamo Thank you very much. I attached my config file. smooks-config-edifact-to-xml-desadv.xml.txt I changed xml to txt file also.
I also tried to remove the listener of ExecutionContext, the result is same. Still very slow
It's stuck during compilation of edifact-to-xml-desadv-mapping.dfdl.xsd
. Having so many points of uncertainty (i.e., minOccurs="0"
) in your schema is extremely expensive in terms of compilation time. You need to reduce these points of uncertainty to improve performance. Any reason why you're not using edifact:parser
to ingest the file? The reader comes bundled with DFDL schemas generated directly from the EDIFACT directories.
@claudemamo
Thank you very much. I will try it. The reason why I don't use edifact:parser is that I need create the EDI file not only for EDIFACT but also X12. For the maintenance point view, I hoped I could maintain for EDIFACT and X12 by same way. Is there any X12 DFDL in smooks?
@claudemamo
The message could be converted in several seconds. But when I add one more segment like below.
suddenly it became very slow. I think it is because Smooks could not locate RFF soon. Because according to EDIFACT definition, there is several location for RFF.
Is there any X12 DFDL in smooks?
Unfortunately no because the X12 implementation guides are proprietary. Even if work was sponsored for its support, I don't think we could open-source it.
The message could be converted in several seconds.
It took ~10 mins for the DFDL schema to compile on my local machine so it's way too long in my opinion.
For the maintenance point view, I hoped I could maintain for EDIFACT and X12 by same way.
A lot of work was put into generating the DFDL schemas for EDIFACT so I suggest you stick to edifact:parser
.
@claudemamo
It took ~10 mins for the DFDL schema to compile on my local machine so it's way too long in my opinion. A little different from my PC. You can see by below message, it just took 10 seconds. ==============Message In============== UNB+G:H+K:J:I+M:L:N+O:P+D+Q:R+B+E+A+C+F' UNH+1+DESADV:D:97A:X+CC+B:A+B:D:C:A+B:C:100:B+B:D:C:A' BGM+D:A:B:C+E:G:F+H+I' DTM+A:2024/01/30:yyyy/MM/dd' DTM+A:2024/01/30:yyyy/MM/dd' ALI+A++B+C+D+E+F' ALI+G++H+I+J+K+L' MEA+X+D:C:B:A+2:1:1:100:100+ABC' MEA+AD+D:C:B:A+2:1:1:100:100+CCC' MOA+B:1000:USD:A:C' MOA+B:1000:USD:A:C'
======================================
start date time:Thu Feb 15 19:26:49 2024 INFO 2024-02-15 19:26:55,755 [main] org.smooks.cartridges.dfdl.DataProcessorFactory: Compiling and caching DFDL schema... ==============Message Out============= <?xml version="1.0" encoding="UTF-8"?>
====================================== end date time:Thu Feb 15 19:26:59 2024
but if I added RFF+F:BBB:1:V' it will take a lot time. May I know how you compile it. I thought I could put it to spring boot as an injection. So the complication time will be no problems.
Yeah you're right. It's taking a long time to parse the second document but compilation is indeed relatively fast. I had a missing newline at the end of the first document which led to the parser attempting various parsing paths. However, the problem about these uncertainty points still stands. If edifact:parser
is not an option, you could try to re-author the DFDL schema so it's simpler and less structured. So for example:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">
<xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format"
schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="ibmEdiFmt:EDIFormat"/>
<dfdl:defineFormat name="EDISegmentSequenceFormat">
<dfdl:format separator="{$ibmEdiFmt:FieldSep}" separatorPosition="postfix"
separatorPolicy="suppressedAtEndStrict" terminator=""/>
</dfdl:defineFormat>
</xsd:appinfo>
</xsd:annotation>
<xsd:element name="message">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="segment" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence dfdl:ref="EDISegmentSequenceFormat">
<xsd:element name="element" type="xsd:string" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Otherwise, as a I said, edit the schema to have less optional elements.
@claudemamo
Thank you for your explanation. Noted and I removed all "minOccurs="0"" in the XSD. And ran it again. Unfortunately, the below message was shown a lot.
However, according to EDIFACT, some segment is optional. I think I could not define all segment as mandatory. I will think some other way.
And edi:unparser runs very well. Result and speed is perfect. Thanks a lot.
Hello,
Thank for providing Smooks. It is amazing. Recently, I tried mooks-edi-cartridge 2.0.0-RC3. I want to convert my edi message to xml, but I found the performance is not good.
This is my edi text. edi.txt
This is my xsd. I change the file name to txt for uploading edifact-to-xml-desadv-mapping.dfdl.xsd.txt
My source:
protected static String runSmooksTransform() throws IOException, SAXException { // Instantiate Smooks with the config... Smooks smooks = new Smooks(new DefaultApplicationContextBuilder().setClassLoader(EdiToXml.class.getClassLoader()).build()); smooks.addConfigurations("smooks-config-edifact-to-xml-desadv.xml"); try { // Create an exec context - no profiles.... //ExecutionContext executionContext = smooks.createExecutionContext();
I tried to run it, it ran 1 hour, but still didn't finish.