plutext / docx4j

JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
https://www.docx4java.org/
2.11k stars 1.2k forks source link

Header1.xml.rels with Relationship with TARGET=NULL produces NPE #100

Open espenaf opened 10 years ago

espenaf commented 10 years ago

Using docx4j 3.0.0 and trying to load an .docx file with WordprocessingMLPackage.load(file), but it produces the following NPE.

[DEBUG] For Relationship Id=rId19 Source is /word/document.xml, Target is header1.xml, type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/header [DEBUG] resolved uri: word/header1.xml [DEBUG] Found content type 'application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml' for /word/header1.xml [DEBUG] Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml on part /word/header1.xml

[DEBUG] Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml on part /word/header1.xml

[DEBUG] ctm returned org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart [DEBUG] Loading part /word/header1.xml [DEBUG] setPackage called for org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart [DEBUG] Set contentType application/vnd.openxmlformats-package.relationships+xml on part /word/_rels/header1.xml.rels

[DEBUG] unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart [DEBUG] nextId reset to : 3 [DEBUG] setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart [DEBUG] setProperty: com.sun.xml.bind.namespacePrefixMapper [DEBUG] For Relationship Id=rId2 Source is /word/header1.xml, Target is NULL, type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/image [DEBUG] setProperty: com.sun.xml.bind.namespacePrefixMapper [DEBUG] resolved uri: word/NULL [DEBUG] Looking at extension '/word/null [ERROR] No content type found for /word/NULL java.lang.NullPointerException at org.docx4j.openpackaging.io3.Load3.getRawPart(Load3.java:419) at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:322) at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:234) at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:346) at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:234) at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:346) at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:234) at org.docx4j.openpackaging.io3.Load3.get(Load3.java:172) at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:353) at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:293) at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:243) at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:226) at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:162) at org.my.ReleaseMojo.filterDocuments(ReleaseMojo.java:358) at org.my.ReleaseMojo.execute(ReleaseMojo.java:331) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214) at org.apache.maven.cli.MavenCli.main(MavenCli.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

I have traced this error to an Relationships tag in header1.xml.rels which contains and TARGET=NULL. Here's my header1.xml.rels:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

I realize this might be some error in my .docx file, but I have tried to clean it up in obvious ways without any luck. Document was created with Microsoft Office Word 2007.

k1ffx commented 10 years ago

I'm seeing the same problem on the Powerpoint side. I have .pptx files that have been created and edited only in the Powerpoint application on Windows; in other words, they are not the output of docx4j applications. When trying to open the file(s),

ppt = (PresentationMLPackage)PresentationMLPackage.load(new File(inputfilepath));

I get NPE. The same thing, by the way, happens using docx4j's "PartsList" program.

The relevant part of the debug log is:

DEBUG RelationshipsPart - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart DEBUG RelationshipsPart - nextId reset to : 2 DEBUG Part - setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart DEBUG Load3 - For Relationship Id=rId1 Source is /ppt/drawings/vmlDrawing14.vml, Target is NULL, type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/image DEBUG Load3 - resolved uri: ppt/drawings/NULL DEBUG ContentTypeManager - Looking at extension '/ppt/drawings/null ERROR ContentTypeManager - No content type found for /ppt/drawings/NULL

My interim workaround is to run a Perl script which unzips the PPTX, deletes the offending .rels files, and then re-zips. My docx4j application (a simple "split into single slides" app) then works fine.

Thanks for all!

plutext commented 10 years ago

Hi Bruce, can you put an offending pptx somewhere, or steps (if they are short) to create one from scratch in Powerpoint? thanks .. Jason

On Tue, Apr 1, 2014 at 6:05 AM, Bruce Rosen notifications@github.comwrote:

I'm seeing the same problem on the Powerpoint side. I have .pptx files that have been created and edited only in the Powerpoint application on Windows; in other words, they are not the output of docx4j applications. When trying to open the file(s),

ppt = (PresentationMLPackage)PresentationMLPackage.load(new File(inputfilepath));

I get NPE. The same thing, by the way, happens using docx4j's "PartsList" program.

The relevant part of the debug log is:

DEBUG RelationshipsPart - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart DEBUG RelationshipsPart - nextId reset to : 2 DEBUG Part - setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart DEBUG Load3 - For Relationship Id=rId1 Source is /ppt/drawings/vmlDrawing14.vml, Target is NULL, type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/image DEBUG Load3 - resolved uri: ppt/drawings/NULL DEBUG ContentTypeManager - Looking at extension '/ppt/drawings/null ERROR ContentTypeManager - No content type found for /ppt/drawings/NULL

My interim workaround is to run a Perl script which unzips the PPTX, deletes the offending .rels files, and then re-zips. My docx4j application (a simple "split into single slides" app) then works fine.

Thanks for all!

  • Bruce

— Reply to this email directly or view it on GitHubhttps://github.com/plutext/docx4j/issues/100#issuecomment-39128123 .

k1ffx commented 10 years ago

Jason ... thanks for the response. Will try to scrounge up a helpful pptx. In the meantime, in looking over my comment, I noticed that I forgot to include an important fact. By "offending .rels files", I meant to indicate those .rels files in ppt/drawings/_rels that specify

target="NULL"

Thanks -

Bruce

plutext commented 10 years ago

It'd also be interesting to know how the pptx came to be. Was it originally a binary.ppt? Was versions of Powerpoint (or other tool) has it been edited with?

On Tue, Apr 1, 2014 at 8:06 AM, Bruce Rosen notifications@github.comwrote:

Jason ... thanks for the response. Will try to scrounge up a helpful pptx. In the meantime, in looking over my comment, I noticed that I forgot to include an important fact. By "offending .rels files", I meant to indicate those .rels files in ppt/drawings/_rels that specify

target="NULL"

Thanks -

Bruce

— Reply to this email directly or view it on GitHubhttps://github.com/plutext/docx4j/issues/100#issuecomment-39141985 .

k1ffx commented 10 years ago

Jason ... at long last ... a pptx I can show you .. sorry for the long delay. You can find it at http://www.rosenatthewheel.com/null_pointer_sample.pptx.

I really can't say how the pptx in its current state came to be. It's not unlikely that it's been around in one form or another for a number of years and has been edited by various versions of PowerPoint, but I can't say for sure.

I hope this pptx sample helps. As mentioned in an earlier post, PartsList will fail with a NPE, ultimately caused by a target=NULL.

Thanks -

Bruce