yob / onix

A convenient mapping between ruby objects and the ONIX XML specification
MIT License
39 stars 33 forks source link

Wont Read Onix Feed #5

Open acolchagoff opened 10 years ago

acolchagoff commented 10 years ago

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>vendor_feeds@place.com</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end

def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end
varunarang commented 10 years ago

I am not sure what might the problem be, but can you try converting the ONIX file to reference tags:

ONIX::Normaliser.process("oldfile.xml", "newfile.xml")

If this converts it to reference tags, you should be able to iterate over the products in the file.

acolchagoff commented 10 years ago

Unfortunately Normalizing doesn't seem to help... but I think I've figured out the issue. It doesn't appear that this gem supports onix 3.0 short, which is what my xml feed is. because the feed is in short format, all of my tag names are different (for example, 'Header' becomes 'header', 'PublisherIDType' becomes 'x447' etc...) the gem is looking for standard tags and ignoring short tags.

Would this explain the issues i'm having?

acolchagoff commented 10 years ago

Making progress, I'm getting this error when calling normalize.

/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/onix20140331-4641-16q41ea:3: warning: failed to load external entity "/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/ONIX_BookProduct_3.0_short.dtd"
"ONIX_BookProduct_3.0_short.dtd">

The normalizer appears to be looking in the temp directory for my dtd file when it didn't move it there. The dtd file is still back in the zip folder.

acolchagoff commented 10 years ago

okay after manually copying 3 dtd files (interdependent dtd's?) Ive fixed that error, but he xslt conversion still seems to be failing, I think its because the xslt script distributed with the gem is for ONIX 2.1