scriptotek / php-marc

Simple interface for working with MARC records using the File_MARC package
MIT License
51 stars 11 forks source link

Init from simplexml #19

Closed danmichaelo closed 4 years ago

danmichaelo commented 4 years ago

Want to have a look at this, @rudolfbyker? Per #12

A small bump in the road: We can only pass a SimpleXMLElement directly to File_MARC if it doesn't contain namespaces.

For this reason I'm still doing a serialization of the SimpleXMLElement followed by a deserialization in XmlImporter::getCollection(). Less efficient, but more convenient, since the SimpleXMLElement object from an OAI response will often contain namespaces.

If the SimpleXMLElement object doesn't contain namespaces, we could pass it directly to File_MARC without any further processing though, so I thought about adding something like this on line 112 in XmlImporter.php:

        if (count($records) == 1 && $ns == '') {
            // If we have a single record without namespaces, we can pass it directly to File_MARC
            $parser = $this->factory->make('File_MARCXML', $records[0], File_MARCXML::SOURCE_SIMPLEXMLELEMENT);
            return new Collection($parser);
        }

Not sure if it's worth the extra code complexity though. What do you think?

codecov-io commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@32f3168). Click here to learn what that means. The diff coverage is 65.21%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #19   +/-   ##
=========================================
  Coverage          ?   84.26%           
  Complexity        ?      268           
=========================================
  Files             ?       23           
  Lines             ?      591           
  Branches          ?        0           
=========================================
  Hits              ?      498           
  Misses            ?       93           
  Partials          ?        0
Impacted Files Coverage Δ Complexity Δ
src/Importers/Importer.php 100% <ø> (ø) 5 <0> (?)
src/Collection.php 93.82% <100%> (ø) 44 <3> (?)
src/Record.php 87.5% <100%> (ø) 21 <1> (?)
src/Importers/XmlImporter.php 85.71% <27.27%> (ø) 23 <2> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 32f3168...97511a5. Read the comment docs.

rudolfbyker commented 4 years ago

It does feel like premature optimisation to add that if statement checking for a single record with no namespace. It adds complexity and only benefits a few users. My XMLs definitely come with namespaces!

It rather sounds like yet another thing that has to be fixed in pear/file_marc. Do you have any idea why they don't accept namespaced XML?

danmichaelo commented 4 years ago

Not sure, but my guess is that it would be quite complex to fix in File_MARC when I think about it. In general SimpleXMLElement is just not great with namespaced XML (I wrote https://github.com/danmichaelo/quitesimplexmlelement to work around some issues). Once you have initialized a SimpleXMLElement object, I'm not aware of a way to change the default namespace.

On the bright side, serialization/deserialization seems quite fast.

rudolfbyker commented 4 years ago

I had a look at this. It seems that you can do something like this:

<?php
require_once 'File/MARCXML.php';
$xml = simplexml_load_file("5518.xml");
$parser = new File_MARCXML($xml, File_MARCXML::SOURCE_SIMPLEXMLELEMENT);
$record = $parser->next();
$field = $record->getField('100');

Both simplexml_load_file and the constructor of File_MARCXML support the $ns and $is_prefix arguments, but I fiddled with them, and I can't figure out what to pass there to make it work.

It seems like you are quite confident that it won't work. Could you explain why not? I'm not so sure that I completely understand XML namespacing anyway...

danmichaelo commented 4 years ago

Sorry if I was unclear. Yes, when you initiate SimpleXML from a string, you can specify a default namespace prefix or URI, but I'm not aware that you can specify it once the object is initialized. So what I'm basically doing/suggesting is a three step process:

  1. Serialize the SimpleXML object into a string
  2. Detect the namespace (I just check for the two most common, plus the case of no namespace)
  3. Pass the string on to File_MARC with the namespace we detected in step 2

At first I thought I could skip the serialize/deserialize step by initiating File_MARCXML directly from a SimpleXML object, but that only works if you have a SimpleXML object you initiated with a default namespace (or one without namespaces). Since the aim of this package is to accept whatever you throw at it, that's not a solution for this package.

Anyways, the PR works as intended, so I think I will go ahead and merge it