Closed ronnievisser closed 8 years ago
Hi!
Stream\File
exactly? If it says "object given" it means you're not actually passing a resource.I am parsing CAMT053 with it. A CAMT053 file has an Ntry node which in my case can have over 10000 NtryDtls child nodes. The Ntry node has information about totals and the NtryDtls about transactions. I need info from both the Ntry and the NtryDtls nodes so I use createUniqueNodeParser functionality.
I use the following code to parse Ntry: $streamer = XmlStringStreamer::createUniqueNodeParser( $event->localFilename, [ 'uniqueNode' => 'NtryDtls' ] ); and to parse the balance inf from the Ntry I use: $streamerBal = XmlStringStreamer::createUniqueNodeParser( $event->localFilename, [ 'uniqueNode' => 'Bal' ] );
If I load the Ntry complete it consumes over 700mb loading the node with simplexml_load_string.
The Stream\File is tried to use like this but fails passing to the unique method: Stream\File( $event->localFilename ); I ws hoping I could pass this instead of the url string to the unique method.
You have the actual XML in a file, yes?
I tried successfully to parse NtryDtls
from a CAMT053 file i found here like this:
<?php
require_once("vendor/autoload.php");
$streamer = Prewk\XmlStringStreamer::createUniqueNodeParser("./FI_camt_053_sample.xml", array("uniqueNode" => "NtryDtls"));
while ($node = $streamer->getNode()) {
// This XML node will just be the NtryDtls and it won't eat up your memory because it is overwritten on every iteration
$xml = simplexml_load_string($node);
// Access the node as usual
foreach ($xml->TxDtls as $TxDtl) {
if (isset($TxDtl->AmtDtls->TxAmt->Amt)) {
echo "Amount: " . (string)$TxDtl->AmtDtls->TxAmt->Amt . "\n";
}
}
}
The same goes for Bal
:
<?php
require_once("vendor/autoload.php");
$streamer = Prewk\XmlStringStreamer::createUniqueNodeParser("./FI_camt_053_sample.xml", array("uniqueNode" => "Bal"));
while ($node = $streamer->getNode()) {
// This XML node will just be the Bals and it won't eat up your memory because it is overwritten on every iteration
$xml = simplexml_load_string($node);
// Access the node as usual
echo "Amount: " . (string)$xml->Amt . "\n";
}
That is exactly how I do it. but i thought maybe I could re-use the resource from the first so it doens't have to load the 30 mb file for 2 times...
how would you go with the following.
In my case I need the BookgDt and the ValDt From the Ntry node. In my file the Ntry node has about 10000 NtryDtls nodes (with all childs). If I load the Ntry with UniqueNode it is consuming 700MB.
Your file has 1 NtryDtls per Ntry.
That is exactly how I do it. but i thought maybe I could re-use the resource from the first so it doens't have to load the 30 mb file for 2 times...
Oh okay. I don't think there is any efficiency in re-using it, really. The CPU cycles will go into parsing anyway, and the memory consumption is low.
If the XML file is 30 MB it won't actually use 30 MBs of memory, it'll stream the file bit by bit and forget about the old iteration on the next.
how would you go with the following. In my case I need the BookgDt and the ValDt From the Ntry node. In my file the Ntry node has about 10000 NtryDtls nodes (with all childs). If I load the Ntry with UniqueNode it is consuming 700MB.
Hm, that's weird. It's been a while but I'm pretty sure the UniqueNode
parser by itself won't start saving anything at all before finding the first node, ergo it would only keep at most one node in memory at a time.
Are you doing anything in the while
loop that saves data to the outside? Like, saves the whole node or something. Can you show me the code in the while
loop that you're using?
Try this (Note: Using the StringWalker
instead of UniqueNode
):
<?php
require_once("vendor/autoload.php");
$streamer = Prewk\XmlStringStreamer::createStringWalkerParser("./FI_camt_053_sample.xml", array("captureDepth" => 4, "expectGT" => true));
while ($node = $streamer->getNode()) {
$xml = simplexml_load_string($node);
$nodeName = $xml->getName();
if ($nodeName === "Ntry") {
// Do something with Ntry
echo "Ntry->Amt: " . (string)$xml->Amt . "\n";
} else if ($nodeName === "Bal") {
// Do something with Bal
echo "Bal->Amt: " . (string)$xml->Amt . "\n";
}
}
If you don't expect any XML comments with tags in the XML you can skip the expectGT
option and you might save some CPU cycles. The example file I linked earlier is full of exampes such as <!-- Recommendation: Use <Foo>blabla</Foo> -->
etc so I needed it.
When debugging before and after the simplexml_load_string I see before 8mb and after 713mb.
Since the Ntry node has so many child and it loads into simplexml object I believe it consumes so many memory. Parsing it into different pieces takes only 35mb for the same process.
Since this is a format I'm unfamiliar with, I can only go on the linked example XML, and it looks like this:
<Ntry>
<!-- Transaction 1 as an sample of SALA batch with elements filled both for PMJ-salaries as well as SCT SALA-->
<!-- Here only as collection, since in salaries the payment level details are not reported -->
<Amt Ccy="EUR">1000.12</Amt>
<CdtDbtInd>DBIT</CdtDbtInd>
<Sts>BOOK</Sts>
<BookgDt>
<Dt>2009-10-29</Dt>
</BookgDt>
<ValDt>
<Dt>2009-10-29</Dt>
</ValDt>
<!-- In case of separate Salary debit report (camt.054) is generated the banks' reference has to be in it as one matching term-->
<AcctSvcrRef>091029ACCTSTMTARCH01</AcctSvcrRef>
<BkTxCd>
<!-- In case of PMJ salaries as in the sample. In case of SCT SALA PMNT/ICDT/ESCT + PurposeCode SALA) -->
<Domn>
<Cd>PMNT</Cd>
<Fmly>
<Cd>ICDT</Cd>
<SubFmlyCd>SALA</SubFmlyCd>
</Fmly>
</Domn>
<!-- Prtry used only in case of PMJ-salaries -->
<Prtry>
<Cd>NTRF+701TransactionCodeText</Cd>
</Prtry>
</BkTxCd>
<NtryDtls>
<Btch>
<!-- customer made batch and message-references (not in old TS but yes in SALA SCT in case that pain.001 is used and direct corresponding matching can be found). Purpose: Reconciiation-->
<!-- Basic recommendation: as much as possible of the original payment instruction material that came from the customer into the bank-->
<MsgId>MSGSALA0001</MsgId>
<!-- in LM-batches this is an info given in the batch record and supported by most of the banks as the initiator batch level identification-->
<PmtInfId>CustRefForSalaBatch</PmtInfId>
<!-- customer made batch's transaction total by the initiated material. Purpose: Reconciiation-->
<NbOfTxs>4</NbOfTxs>
</Btch>
<TxDtls>
<!-- used to specify what subtype (purpose code) of SCT SALA (category purpose and notice that tx code in this case is PMNT/ICDT/ESCT) debtor has used. Not so critical on debtor stmts but on creditor it is -->
<Purp>
<Cd>SALA</Cd>
</Purp>
</TxDtls>
</NtryDtls>
</Ntry>
Are you saying that your Ntry
nodes have a lot more children than that node?
Or are you saying that you are trying to simplexml_load_string
on the whole document? The point of XmlStringStreamer
is to allow you to simplexml_load_string
every node individually and then forgetting about it in the next iteration. That's why the memory consumption is low.
If you, however, save every simplexml object into an array outside of your while
loop on every iteration or something, you will lose the benefit. Is this what you're doing?
It's hard to make assumptions when I don't have your XML or code to look at (I understand the XMLs are sensitive information), but using the example CAMT053 example XML I don't see why it would chew up 700 MB of memory, really.
<!-- Transaction 1 as an sample of SALA batch with elements filled both for PMJ-salaries as well as SCT SALA-->
<!-- Here only as collection, since in salaries the payment level details are not reported -->
<Amt Ccy="EUR">1000.12</Amt>
<CdtDbtInd>DBIT</CdtDbtInd>
<Sts>BOOK</Sts>
<BookgDt>
<Dt>2009-10-29</Dt>
</BookgDt>
<ValDt>
<Dt>2009-10-29</Dt>
</ValDt>
<!-- In case of separate Salary debit report (camt.054) is generated the banks' reference has to be in it as one matching term-->
<AcctSvcrRef>091029ACCTSTMTARCH01</AcctSvcrRef>
<BkTxCd>
<!-- In case of PMJ salaries as in the sample. In case of SCT SALA PMNT/ICDT/ESCT + PurposeCode SALA) -->
<Domn>
<Cd>PMNT</Cd>
<Fmly>
<Cd>ICDT</Cd>
<SubFmlyCd>SALA</SubFmlyCd>
</Fmly>
</Domn>
<!-- Prtry used only in case of PMJ-salaries -->
<Prtry>
<Cd>NTRF+701TransactionCodeText</Cd>
</Prtry>
</BkTxCd>
<NtryDtls>
<Btch>
<!-- customer made batch and message-references (not in old TS but yes in SALA SCT in case that pain.001 is used and direct corresponding matching can be found). Purpose: Reconciiation-->
<!-- Basic recommendation: as much as possible of the original payment instruction material that came from the customer into the bank-->
<MsgId>MSGSALA0001</MsgId>
<!-- in LM-batches this is an info given in the batch record and supported by most of the banks as the initiator batch level identification-->
<PmtInfId>CustRefForSalaBatch</PmtInfId>
<!-- customer made batch's transaction total by the initiated material. Purpose: Reconciiation-->
<NbOfTxs>4</NbOfTxs>
</Btch>
<TxDtls>
<!-- used to specify what subtype (purpose code) of SCT SALA (category purpose and notice that tx code in this case is PMNT/ICDT/ESCT) debtor has used. Not so critical on debtor stmts but on creditor it is -->
<Purp>
<Cd>SALA</Cd>
</Purp>
</TxDtls>
</NtryDtls>
<NtryDtls>
<Btch>
<!-- customer made batch and message-references (not in old TS but yes in SALA SCT in case that pain.001 is used and direct corresponding matching can be found). Purpose: Reconciiation-->
<!-- Basic recommendation: as much as possible of the original payment instruction material that came from the customer into the bank-->
<MsgId>MSGSALA0001</MsgId>
<!-- in LM-batches this is an info given in the batch record and supported by most of the banks as the initiator batch level identification-->
<PmtInfId>CustRefForSalaBatch</PmtInfId>
<!-- customer made batch's transaction total by the initiated material. Purpose: Reconciiation-->
<NbOfTxs>4</NbOfTxs>
</Btch>
<TxDtls>
<!-- used to specify what subtype (purpose code) of SCT SALA (category purpose and notice that tx code in this case is PMNT/ICDT/ESCT) debtor has used. Not so critical on debtor stmts but on creditor it is -->
<Purp>
<Cd>SALA</Cd>
</Purp>
</TxDtls>
</NtryDtls>
<NtryDtls>
<Btch>
<!-- customer made batch and message-references (not in old TS but yes in SALA SCT in case that pain.001 is used and direct corresponding matching can be found). Purpose: Reconciiation-->
<!-- Basic recommendation: as much as possible of the original payment instruction material that came from the customer into the bank-->
<MsgId>MSGSALA0001</MsgId>
<!-- in LM-batches this is an info given in the batch record and supported by most of the banks as the initiator batch level identification-->
<PmtInfId>CustRefForSalaBatch</PmtInfId>
<!-- customer made batch's transaction total by the initiated material. Purpose: Reconciiation-->
<NbOfTxs>4</NbOfTxs>
</Btch>
<TxDtls>
<!-- used to specify what subtype (purpose code) of SCT SALA (category purpose and notice that tx code in this case is PMNT/ICDT/ESCT) debtor has used. Not so critical on debtor stmts but on creditor it is -->
<Purp>
<Cd>SALA</Cd>
</Purp>
</TxDtls>
</NtryDtls>
<NtryDtls>
<Btch>
<!-- customer made batch and message-references (not in old TS but yes in SALA SCT in case that pain.001 is used and direct corresponding matching can be found). Purpose: Reconciiation-->
<!-- Basic recommendation: as much as possible of the original payment instruction material that came from the customer into the bank-->
<MsgId>MSGSALA0001</MsgId>
<!-- in LM-batches this is an info given in the batch record and supported by most of the banks as the initiator batch level identification-->
<PmtInfId>CustRefForSalaBatch</PmtInfId>
<!-- customer made batch's transaction total by the initiated material. Purpose: Reconciiation-->
<NbOfTxs>4</NbOfTxs>
</Btch>
<TxDtls>
<!-- used to specify what subtype (purpose code) of SCT SALA (category purpose and notice that tx code in this case is PMNT/ICDT/ESCT) debtor has used. Not so critical on debtor stmts but on creditor it is -->
<Purp>
<Cd>SALA</Cd>
</Purp>
</TxDtls>
</NtryDtls>
</Ntry>```
This is how my XML looks like.. I got over 1000 of those NtryDtls. I only load the simplexml inside the while loop
Alright, and you got the memory issue using createStringWalkerParser
(with expectGT
set to true
as in my example above)?
No, Using the createUniqueNodeParser
See my example above where I'm using the StringWalker, please. It might solve your problem.
@RonnieVisser With what solution did you came up? Did you use this library? Maybe I can try to add it to our CAMT library. We also see problems when the XML gets bigger.
@frederikbosch If you can provide me with an XML that uses a lot of memory I can probably provide a solution using this library.
@prewk Can you tell me what this library offers over the XMLReader extension?
Probably mostly ease of use. XMLReader, could be argued, needs pretty specific implementations for every different XML document etc whereas you just feed my library with an XML and it usually "just works".
Although, it must be said, I have very little experience with the XMLReader extension, so my lack of insight may cause a bias.
I have fixed my issues with the createStringWalkerParser. Doesn't matter the size of the XML it is using no more then 6mb :):)
@RonnieVisser Sounds good. Will investigate how to use it within the CAMT package.
Hi,
I Have a file which has to be parsed in pieces. Currently everytime I load from file url. So it has to be loaded everytime.
Is it possible to load from resource? looking at the code this seems to be possible but when using the Stream\File it tells me "expects parameter 1 to be resource, object given".
How can I load from resource???