Open johnbillion opened 9 years ago
It would loop over every import file and run it through
simplexml_load_file()
to see if any errors occur.
Just poking my head in to mention specifically about the XML parsing: for anything where you could be loading in a giant file, avoid SimpleXML. It loads the entire DOM into memory and is super inefficient for memory usage; you'll almost always need to raise your memory limit to something insane to be able to handle it.
Instead, you're better off with a pull parser, such as XMLReader. It might require rearchitecturing stuff to make it work though, since XMLReader operates one element at a time. That said, it's internally DOMElement-based, so you can pull out a node quickly to work with it easier for a hybrid approach.
Typically, your main loop will look something like:
while ( $reader->read() ) {
if ( $reader->nodeType !== XMLReader::ELEMENT ) {
continue;
}
switch ( $reader->name ) {
case 'item':
// Convert item to DOM to handle easier
$node = $reader->expand();
// Do something with the item
$this->handleItem( $node );
// Skip to the end of the node (</el>) now that we're done.
$reader->next();
}
}
The overall architecture of this is a bit easier if you have the parser drive the importer rather than the other way around.
If you don't use XMLReader/etc, I guarantee you'll start running into problems with file size and run time with "large" (8MB+) files. You'll need to use workarounds like increasing the memory limit or splitting the file into multiple parts.
For a pre-run check like this, you can still use XMLReader, just have an empty while loop calling $reader->read()
and check if you hit any errors at the end. You can potentially also fix errors on-the-fly as well.
I'm going to list some of the things I've added to or wanted on the IOL migrations.
First up is a pre-flight check which (for an XML file based import at least) first verifies the integrity of the import files before we even begin the import. It would loop over every import file and run it through
simplexml_load_file()
to see if any errors occur. For other import types, it would run them throughjson_decode()
etc as appropriate.On the QuiFinanza import we have a bunch of files without a BOM which is tripping up
simplexml_load_file()
, but I wasn't aware of this until the initial import was in progress.