Open waldoj opened 11 years ago
(@krues8dr is working on this.)
I only copied over the important bits from the DC Code branch, which were actually really simple. Then I juggled the error handling a bit and added a new requirement to the config file to give the path to where the import files should life. I also renamed that directory, because 'xml' isn't a very good name for a place to keep json files. ;)
This probably could do with some testing - I plan tomorrow to use our generated JSON files to see if this will work as expected.
This is great! I hope it works. :)
(I've been on the go, getting to and traveling around Buenos Aires, so I'm not in a position to test this just now.)
(@waldoj no worries. As an aside, I'm generally only closing things that I know that you've already looked at, or that I generally feel don't really need any looking over. Everything else I'm just leaving a note that it's 'Pretty Much Done'.)
Ugh. I just wasted the entire morning trying to get this to actually run on our exported JSON files, but the format of those seems to be very different from what we're importing via XML as to be a complete rewrite.
So, technically, this does handle JSON, but the parsing is going to vary depending on the source. Is it worth actually implementing those details?
(Did not mean to close this. Oops.)
I'd say the best thing to do, then, is to change the format of the JSON files. Does that seems reasonable?
For future reference: for test files, I just ran our xml files through https://github.com/hay/xml2json
ls ./import-data/ | xargs -I {} python ~/Downloads/xml2json-master/xml2json.py -t xml2json ./import-data/{} -o ./json/{}.json
and then removed all the @
s with sed
ls ./json/ | xargs -I {} sed -i 's/@//g' ./json/{}
which has gotten me 90% of the way there. Now just dealing with the fact that the xml parser and json parser return slightly different data types...
... which means that a lot of this relies on SimpleXMLElements
, which don't actually act like standard StdClass
es. The normal json_decode(json_encode($obj))
isn't working to swap types here, so I'm trying other methods and considering other options. Are we set on SimpleXML
, or would XMLReader
be an acceptable replacement possibly?
"SimpleXMLElement::children() returns a node object no matter if the current node has children or not."
http://php.net/manual/en/simplexmlelement.children.php :rage:
I'm starting to feel like this might be a time-sink. If you don't feel like you're making adequate progress on this, just leave this where it's at. It's fine. There are plenty of other issues to be resolved. :)
The thing that I'm working on now actually gets us closer to allowing any data source, so it's worth finishing up. We're doing a few things that are very SimpleXMLElement specific (mainly string
-casting) and need attention anyway. I'll have another go tomorrow morning once I'm hyper-caffeinated.
I'll definitely defer to your judgment because a) the flu has broken my brain and b) this is one of those issues you've got to get deep inside of to have a proper perspective on.
Ok, so I got this mostly working, but it still needs work. I'm really stuck on some prefix_hierarchy
madness that is breaking all the time. I'm reverting for now, but this should be revisited in the future - SimpleXML does things normal objects don't, and we're making some really bad assumptions in the (string)
type conversions from objects there. To wit, most of this will probably only work on SimpleXML objects, and other data types might have issues.
My patch has two pieces. This function needs to be added to functions.inc.php
and is used to translate those SimpleXML classes: https://gist.github.com/krues8dr/6463853
The second is a replacement for the base State parser - as I said, it's not working 100% at the moment, and needs more attention: https://gist.github.com/krues8dr/00d49b52e76f3b096b6d
I'm dropping this now and moving on to other issues.
@waldoj ^
Propose moving Milestone to Future
. We've got a 90% solution here - but I've yet to see anyone with JSON data aside from what we're getting for DC (which is downstream data anyway).
I've yet to see anyone with JSON data
As of 18 months ago, that looked like a distinct possibility. JSON imports was a case of skating to where the puck would be, or so I thought. But right now, I just don't see evidence of it happening. Future
it is.
The XML import system is swell, but XML is awful. For folks who don't want to deal with XML, but don't want to modify the parser just to import JSON, it's best to add some JSON support.
The trick here is going to be avoiding duplication of code. The XML extraction is pretty bespoke—there's nothing obvious about how JSON importing can live within the same system.