tforsberg / docbkx-tools

Automatically exported from code.google.com/p/docbkx-tools
0 stars 0 forks source link

WebHelp indexer fails in docbkx when using DocBook XSL 1.78.0 #97

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Generate WebHelp using docbkx-maven-plugin and the latest version of DocBook 
XSL (1.78.0).

Result: 

The content is generated, but the the indexing step fails with the following 
error:

[INFO] java.lang.NullPointerException
[INFO]  at com.nexwave.nsidita.DirList.<init>(DirList.java:35)
[INFO]  at 
com.agilejava.docbkx.maven.AbstractWebhelpMojo.postProcessResult(AbstractWebhelp
Mojo.java:117)
[INFO]  at 
com.agilejava.docbkx.maven.AbstractTransformerMojo.execute(AbstractTransformerMo
jo.java:166)
[INFO]  at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPlugin
Manager.java:101)
[INFO]  at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
[INFO]  at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[INFO]  at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[INFO]  at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(Lifecycl
eModuleBuilder.java:84)
[INFO]  at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(Lifecycl
eModuleBuilder.java:59)
[INFO]  at 
org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(Lifecyc
leStarter.java:183)
[INFO]  at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.ja
va:161)
[INFO]  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
[INFO]  at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
[INFO]  at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
[INFO]  at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
[INFO]  at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
[INFO]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[INFO]  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[INFO]  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:25)
[INFO]  at java.lang.reflect.Method.invoke(Method.java:597)
[INFO]  at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:2
90)
[INFO]  at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
[INFO]  at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java
:409)
[INFO]  at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:

There have been a number of changes to the WebHelp DocBook XSL in this release 
which needs to be reflected in the docbkx-maven-plugin:

- WebHelp XSL no longer writes generated HTML files to a content directory. 
Instead, HTML files are written to the root of the WebHelp output directory. 

- The indexer class (com.nexwave.nquindexer.IndexerMain) takes some additional 
system properties - doStem and indexerExcludedFiles 

- The WebHelp indexer is now invoked directly (see the readme.xml and generated 
output in the WebHelp folder of the DocBook XSL 1.78.0 distribution - I'm not 
clear on how important this is).

Original issue reported on code.google.com by davidpor...@gmail.com on 30 Jan 2013 at 1:32

GoogleCodeExporter commented 8 years ago
Issue 98 has been merged into this issue.

Original comment by MimilO...@gmail.com on 15 Feb 2013 at 1:53

GoogleCodeExporter commented 8 years ago
Hello,

I am fixing the issue but I have the following questions (for David Cramer):
- are you okay to add new plugin parameter to doStem and indexerExcludedFiles 
or you plan to add them to the stylesheets?
- I had some issues to generate the javascript parts for the search. I had to 
override the xsl parameter to set it to the string 'true' to have it generated. 
So it does mean that '0' or '1' (or any boolean variant cannot work). Isn't 
there a way to properly deal with the boolean variants in the xslts?
- the parsing of html files is really slow ... a few minutes on my side. Am I 
doing something wrong?

Regards,
Cedric,

Original comment by MimilO...@gmail.com on 15 Feb 2013 at 2:04

GoogleCodeExporter commented 8 years ago
Does someone have working code for this issue? I'd love to try it out in that 
case.

I did an attempt but didn't get it fully working.

Regarding slow parsing: I did profile the code I played with, and it looked 
like it was waiting for some kind of network operation which timed out. So in 
all it used 100s for waiting and less than 1s to do actual work on my small 
test data.

/anders

Original comment by and...@nawroth.se on 20 May 2013 at 6:38

GoogleCodeExporter commented 8 years ago
If it's all network then that makes me think the indexer is now trying to load 
the xhtml DTDs. We should look at either modifying the indexer so it doesn't do 
that or using catalog files to fetch a local copy of the DTD. Having the 
indexer stop doing that is the better approach. There's no need for the DTDs 
when parsing. 

Original comment by crame...@gmail.com on 23 May 2013 at 3:04

GoogleCodeExporter commented 8 years ago
It's org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity which tries to 
connect to something located at www.oasis-open.org (I simply tried an execution 
while offline). I know the DocBook DTD lives there: 
http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd

Original comment by and...@nawroth.se on 23 May 2013 at 4:07

GoogleCodeExporter commented 8 years ago
Weird that it would be trying to fetch the DocBook dtd from Oasis since the 
indexer parses the generated html files and not the DocBook source. Let me ask 
Kasun to take a look.

Original comment by crame...@gmail.com on 23 May 2013 at 8:40

GoogleCodeExporter commented 8 years ago
Using Maven + DocBook inside our organization, we are interested in that 
feature.
I had the occasion to have a look at this issue, and can propose a patch that 
should fix all the issues introduced by the new version of docbook-xsl.
Here is a patch for this, made on the current trunk status (rev 261).

Most of the fixes are based on the contents of the docbook-xsl distribution, 
seeing what has been made in the Ant script, and translating it into the 
WebHelp mojo.

Some dependencies have been added/modified to the plugin pom.xml.
* docbook-xsl-1.78.1.zip: taken from sourceforge, I just changed the root 
directory to be /docbook instead of /docbook-1.78.1
* docbook-xsl-webhelpindexer-1.78.1.jar: taken from that same zip. I think it's 
worth setting that artifact version the same as the docbook one
* lucene-analyzers and lucene-core (3.0.0), tagsoup 1.2.1: needed for the 
indexer, these jars are in fact specifically added to the classpath in the Ant 
script

Original comment by tdema...@gmail.com on 29 May 2013 at 2:24

Attachments:

GoogleCodeExporter commented 8 years ago
Hi,

committed in r262 

thank you tdemande for your patch, I applied it at 99% (I just removed some 
parameters in a method because I am not yet using the artifacts from 1.78.1 -> 
I will also make them available in docbook project). Then I will apply the 1% 
missing.

I started some times ago this issue and I had same kind of modifications (not 
committed) but you did more than me, so I guess your one is better =)

I also did a trick in the sample to have the search engine working 
(webhelp.include.search.tab parameter): 

<profile>
      <id>docbkx.webhelp</id>
      <build>
        <plugins>
          <plugin>
            <groupId>com.agilejava.docbkx</groupId>
            <artifactId>docbkx-maven-plugin</artifactId>
            <version>${project.version}</version>
            <executions>
              <execution>
                <goals>
                  <goal>generate-webhelp</goal>
                </goals>
                <phase>generate-sources</phase>
              </execution>
            </executions>
              <configuration>
                <includes>webhelpsite/readme.xml</includes>
                <webhelpIndexerLanguage>fr</webhelpIndexerLanguage>
                <webhelpIncludeSearchTab>1</webhelpIncludeSearchTab>
                <!-- temporary hack to get webhelp javascript properly generated -->
                <customizationParameters>
                  <parameter>
                    <name>webhelp.include.search.tab</name>
                    <value>true</value>
                  </parameter>
                </customizationParameters>
                <!--templateDirectory>path/to/your/webhelptemplate/</templateDirectory-->
              </configuration>
          </plugin>
        </plugins>
      </build>
    </profile>

I stay you tuned when I have updated to 1.78.1.

I hope that changing the SAXParserFactory will not have side effects.

Thanks another time for your patch,
Regards,
Cedric,

Original comment by MimilO...@gmail.com on 29 May 2013 at 8:11

GoogleCodeExporter commented 8 years ago
Cedric, if I'm not mistaking, the trick you did for webhelp.include.search.tab 
parameter can be removed when you will update to 1.78.1.

Changing the SAXParserFactory to tagsoup is what is made by default in the ZIP 
distribution, we might think about adding a parameter in the plugin to be able 
to use another one.

Original comment by tdema...@gmail.com on 30 May 2013 at 9:43

GoogleCodeExporter commented 8 years ago
That is true, it is no more needed. Fixed in r263.

Original comment by MimilO...@gmail.com on 31 May 2013 at 7:14

GoogleCodeExporter commented 8 years ago
Great work, so far everything seems to work now.

There's still the slowness issue though, at least for me.

When com.agilejava.docbkx.maven.AbstractTransformerMojo#execute is executed, 
org.apache.xerces.impl.XMLEntityManager#setupCurrentEntity ends up requesting 
an input stream, and in the end this leads to 
sun.net.www.http.KeepAliveCache#run end up in timeouts. (there's some amount of 
guessing in these statements)

And as I said earlier, it connects to the www.oasis-open.org host.

Original comment by and...@nawroth.se on 7 Jun 2013 at 2:37

GoogleCodeExporter commented 8 years ago
Hello,

on my side the slowness issue disappeared, do you use xinclude or a particular 
docbkx/docbook configuration?

regards,
Cedric,

Original comment by MimilO...@gmail.com on 7 Jun 2013 at 3:14

GoogleCodeExporter commented 8 years ago
The slowness disappears when I remove the doctype declaration from the 
document. The docbook 4.5 doctype declaration was copy-pasted from docbook.org, 
so it should be correct.

Original comment by and...@nawroth.se on 7 Jun 2013 at 3:30

GoogleCodeExporter commented 8 years ago
The answer to the slowness I saw might be in the FAQ under "Generation is 
slow". In the docs, adding 4.4 or 5.0 docbook-xml is covered, but not add 4.5. 
Adding it takes 20s off the build. Might be good to cover this case in the docs 
as well. I work with generated DocBook files, and they normally always include 
the DTD reference unless I rip it out.

Original comment by and...@nawroth.se on 10 Jun 2013 at 12:56

GoogleCodeExporter commented 8 years ago
Okay good point, I commited a new sample in r264, as thirdparty repository is 
not allowed on sonatype oss, all the information is given as comment in the 
sample.

Original comment by MimilO...@gmail.com on 10 Jun 2013 at 7:58

GoogleCodeExporter commented 8 years ago
#13 and...@nawroth.se
The slowness disappears when I remove the doctype declaration from the 
document. The docbook 4.5 doctype declaration was copy-pasted from docbook.org, 
so it should be correct.

the best way add docbook-xml dependency to docbkx-maven-plugin, the point is 
the version must correct. 
eg: my version is 4.5.
ps 4.5 the groupId is docbook,but  not org.docbook
docbook file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<book lang="zh_cn" xmlns:xi="http://www.w3.org/2001/XInclude">

pom file: 
<plugin>
                <groupId>com.agilejava.docbkx</groupId>
                <artifactId>docbkx-maven-plugin</artifactId>
                <dependencies>
                    <dependency>
                        <groupId>docbook</groupId>
                        <artifactId>docbook-xml</artifactId>
                        <version>4.5</version>
                        <scope>runtime</scope>
                    </dependency>
                </dependencies> 

Original comment by liushimi...@gmail.com on 25 Jun 2014 at 11:19