oreillymedia / docbook2asciidoc

XSL for transforming DocBook to AsciiDoc
MIT License
61 stars 33 forks source link

Modifications based on what we use at JBoss, also general changes #25

Open LightGuard opened 11 years ago

LightGuard commented 11 years ago

I've made quite a few changes. A couple of them are probably JBoss specific, but there are some others which are helpful for people in general. I've also reformatted the XSLT to read easier and make it more maintainable. The version of saxon has been updated to the most current version as of now. The shell script also contains update to support the new saxon version. I decided not to squash in cas there were things you wanted to take but not others.

sandersk commented 11 years ago

Thanks, @LightGuard. We will definitely review at O'Reilly, and merge in as appropriate.

pmuir commented 11 years ago

@sandersk did you get a chance to review? We are increasingly choosing asciidoc @ JBoss, and need a way to convert our docbook over! - hopefully by collaborating on this project :-)

sandersk commented 11 years ago

Hi @LightGuard and @pmuir,

Thanks for the nudge to take a look :)

Overall, these are awesome changes. I also really appreciate all the cleanup you did to consistently use xsl:text throughout, which is my preference as well.

I did notice a few issues, though:

<!--<xsl:value-of select="util:carriage-returns(2)"/>-->

In my testing, that seems to be causing the following body text to run into the section title.

  <xsl:template match="para/text()">                                                                                                                                         
    <xsl:value-of select="normalize-space(.)"/>                                                                                                                              
    <xsl:if test="following-sibling::text()[1] != following-sibling::node()[1] or following-sibling::node()[1][position() = last()]">                                        
      <!-- Add a space if the next node is not a text node -->                                                                                                               
      <xsl:text> </xsl:text>                                                                                                                                                 
    </xsl:if>                                                                                                                                                                
  </xsl:template>

Was introducing erroneous whitespace before inline elements. For example, if the source XML looks like this:

All PHP statements must end with a semicolon (<literal>;</literal)

The the above handling will result in the following AsciiDoc output:

All PHP statements must end with a semicolon ( +;+).

Instead of the desired:

All PHP statements must end with a semicolon (+;+).

That extra space character will be considered significant when the AsciiDoc is converted to DocBook (as that text content will be placed in a ), so it's an error to include it in the AsciiDoc source.

  <xsl:template match="figure">                                                                                  
    <xsl:call-template name="process-id"/>                                                                       
    <xsl:text>.</xsl:text>                                                                                       
    <xsl:apply-templates select="title"/>                                                                        
    <xsl:value-of select="util:carriage-returns(1)"/>                                                            
    <xsl:text>image::</xsl:text>                                                                                 
    <xsl:if test="imageobject[@role = 'web']">                                                                   
      <xsl:apply-templates select="imageobject" />                                                               
    </xsl:if>                                                                                                    
    <xsl:value-of select="util:carriage-returns(1)"/>                                                            
  </xsl:template> 

Because s are not direct children of

elements; they must be children of a or . So the if statement should be rewritten as:

    <xsl:if test="descendant::imageobject[@role = 'web']">                                                                   
      <xsl:apply-templates select="descendant::imageobject" />                                                               
    </xsl:if>

I'll work on resolving the above issues and folding the changes into our toolchain, but probably won't be able to get to that for another week or so. Thanks again for all your feedback and changes, which are very helpful. I would love to continue collaborating with you on this project.

Best, Sanders

LightGuard commented 11 years ago

I'm going to attempt to do these inline, we'll see how it works out.

On Fri, Jan 18, 2013 at 12:02 PM, Sanders Kleinfeld < notifications@github.com> wrote:

Hi @LightGuard https://github.com/LightGuard and @pmuirhttps://github.com/pmuir ,

Thanks for the nudge to take a look :)

Overall, these are awesome changes. I also really appreciate all the cleanup you did to consistently use xsl:text throughout, which is my preference as well.

I did notice a few issues, though:

-

First, I see you changed the XSL to use an output method of "text" rather than XML. The reason I currently have it set to use an output method of XML is that there's logic in the stylesheets to use passthroughs in certain circumstances, and if you're in text mode and do an xsl:copy-of, then any XML markup will get dropped. There might also be problems if the content in the passthroughs contains reserved XML characters, as they won't get escaped properly. So given that, I set the output method to text and used disable-output-escaping attributes as appropriate to get the right text output.

As long as that doesn't mess up any of the output, putting things back to xml output should be fine. Off the top of my head I don't think of anything this would really mess up.

-

I also see that you've commented out some whitespace following section titles, e.g.:

In my testing, that seems to be causing the following body text to run into the section title.

I think we need some test documents to use to make sure we don't break anything, in my testing with some of the docbook files we have (include the CDI specification, a 117 page document) things worked fine. There may be cases you're running into where this isn't the case. If we have a document that covered everything it would be much easier to make changes and know they don't break things. Might I suggest a test docbook file and a controlled asciidoc file we could diff against?

  • Also, in some situations, I found that the modified logic for text nodes in paragraphs:

    xsl:text /xsl:text /xsl:if /xsl:template

Was introducing erroneous whitespace before inline elements. For example, if the source XML looks like this:

All PHP statements must end with a semicolon (;</literal)

The the above handling will result in the following AsciiDoc output:

All PHP statements must end with a semicolon ( +;+).

Instead of the desired:

All PHP statements must end with a semicolon (+;+).

That extra space character will be considered significant when the AsciiDoc is converted to DocBook (as that text content will be placed in a

), so it's an error to include it in the AsciiDoc source. Thought I had this correctly figured out, spent a lot of time on it, but may be missing a case. - The revised logic here for figure images is incorrect: xsl:text./xsl:text xsl:textimage::/xsl:text /xsl:if /xsl:template Because s are not direct children of
elements; they must be children of a or . So the if statement should be rewritten as: ``` ``` That's fine. The processing chain we have at JBoss is pretty lenient with things, I've found a few instances where the docbook created by hand doesn't actually validate to the schema. This could be another one of those instances. I'll work on resolving the above issues and folding the changes into our toolchain, but probably won't be able to get to that for another week or so. Thanks again for all your feedback and changes, which are very helpful. I would love to continue collaborating with you on this project. Best, Sanders — Reply to this email directly or view it on GitHubhttps://github.com/oreillymedia/docbook2asciidoc/pull/25#issuecomment-12436237.

Jason Porter http://en.gravatar.com/lightguardjp