juanmf / sfPlugins

Portable symfony Plugins and libs for solving common issues on importing and reporting.
GNU General Public License v3.0
9 stars 1 forks source link

Word2XSLTRenderingFO.xsl: Error on transforming Word XML with disc bullet list #1

Closed thicolares closed 11 years ago

thicolares commented 11 years ago

Hi!

Firs of all, I think your code is VERY elegant and well documented. Thanks a lot for that. I'm working to port it to FLOW3.

When transforming a Word XML to fo.xml, I've noticed the error below appears if the .xml has a disc bullet list. The same error is NOT displayed with numbered lists.

fo-error (it's in portuguese, since my Office is localized for my country)

Maybe some improvement at Word2XSLTRenderingFO.xsl may resolve it, but I'm not used to it yet.

juanmf commented 11 years ago

Hi Thiago,

Thanks for using it, it's very pleasing for me, please let me know if the docs (specially the PDF HowTo) can be improved. I'm plannig to make a sf2 bundle, we have it already working with nice results in a sf2 project, I want to upload it to packagist.

Could you please send me the wordml that is giving you troubles?

Thanks! Juan

PD: I'm from Corrientes, Argentina, mais não fala Português muito bem :).

Edit: same user as here but @ gmail.com

On Mon, Jan 7, 2013 at 11:35 AM, Thiago Colares notifications@github.comwrote:

Thanks a lot for that

thicolares commented 11 years ago

Hi Juan!

Ok! I'll send you some example.

Regards!

P.s.: I'm from Salvador, Brasil, pero no hablo español! hehe

juanmf commented 11 years ago

Hey,

No need, I've just seen what you mean.. the stylesheet works, if you make the transformation with another XSLT processor, I tried with Netbeans' one.

To do so, open the just saved wordml 2003 file in netbeans and click the "transform" blue arrow, when the popup fires select Word2XSLTRenderingFO.xsl

Still, Word2FO makes some mistakes with the list-items. I had to manually correct some FO markup, (you need to indent the markup to understand it, but re-linearize it before processing with ApacheFOP, I use notepad++ XML Tools for that):

Replace some unprefixed tags:

                <fo:list-block ...>
                    <list-item ...> <!-- @SEE here! no fo: prefix, don't know why Word2FO omits that-->
                        <fo:list-item-label start-indent="18pt" text-indent="0pt" font-family="Symbol">
                            <fo:block>•</fo:block>
                        </fo:list-item-label>
                        <!-- @SEE here remove 'start-indent="0pt" ', if not, the bullet appears over the text -->
                        <fo:list-item-body end-indent="inherit" start-indent="0pt" text-indent="0pt"> 
                            <block> <!-- @SEE here! no fo: prefix again -->
                                <fo:inline>
                                    <fo:leader leader-length="0pt"/>pepe</fo:inline>
                            </block><!-- @SEE here! no fo: prefix again -->
                        </fo:list-item-body>
                    </list-item><!-- @SEE here! no fo: prefix again -->
                </fo:list-block>

those are Word2FO bugs, I'll see how to add code to Word2XSLTRenderingFO.xsl to redress it.

Thanks!

Note: CLI fop command tells you a lot about the FO markup bugs. Capture

thicolares commented 11 years ago

Mmm... I got it.

It would be great!! Unfortunately to change the .fo manually is not an option for me because non-programmer users are meant to upload fo templates, so it could sounds like rock science thing. So far, I'll suggest to remove those lists hehe.

Regards!

EDIT: Just now I could see attachments. More explained now.

juanmf commented 11 years ago

Ok, I'll see what can I do to help. http://msdn.microsoft.com/en-us/library/office/aa537167(v=office.11).aspx#officewordwordmltoxsl-fo_examiningthe Interesting link.

thicolares commented 11 years ago

Is the FO.XML a standard format to reports? I mean, just wanna get an idea of if worth the effort to invest in this format, even regarding this is a W3C recommendation (Is it still?)

thicolares commented 11 years ago

Mm, this link seems good!! I'll check this out as soon as finished my Step 3 version for FLOW3 :)

juanmf commented 11 years ago

I'm not sure why is it so difficult to find good docs on it. It's a standard format (FO) and a W3C recommendation, that motivated great projects like ApacheFOP, RenderX. I'm not an expert in XSLT, but I think its a great tool as intermediate language, for using in Step3, and ApacheFop keeps growing (slowly).

I researched about reporting for some time, but found nothing good enough for PHP. So I thought it was worth it to give it a shot. Of course, I have more work to do, anyone who would like to help is welcome...

I never found anything against FO (best bet for printed media), just critics about the XSLT learning curve. and a tweet talking about XSLT unfulfilled promise promoting a JS layout engine. made me think its not used as a transformation language because JS/CSS made it better.

For reporting, I see no better choice. Again, the problem is that people fears XSLT I think. but Word2FO makes it a lot easier. ApacheFOP respects physical layout (margins, indents, etc) perfectly. Only this is enough to use FO, instead of HTML with some PDF renderer like mpdf. Then with XSLT you get the other formats, FO2Html.xsl needs more work also.

juanmf commented 11 years ago

But if you want my advise on investing effort, try symfony 2! hehe http://symfony.com/symfony-in-five-minutes

thicolares commented 11 years ago

Indeed, work with standard recommendations from W3C is always a good thing, I think.

I've tried Symfony once, just for fun. It seems to be a great framework indeed. I've worked (and collaborated) to CakePHP before, but now I'm working with FLOW, which is a great framework really. I really like object injections, namespaces etc.. In addiction, FLOW is using (and adding the use of) all new sweet PHP releases stuffs.

I've noticed that you've coded SOAP and Console approaches to communicate to Apache FOP. A third way could be JavaBridge (they say it's 50 faster then SOAP - SOAP itself is very slow). I've tried a little, but only got results to .rptdesign file type, not for FO.

Finally, I'm interested on helping to improve the code. But my priority is to code it for FLOW, maybe a stand alone code could fit all frameworks as a Vendor / Lib / API package. What to you think?

P.s.: This conversation has nothing more to do with the issue. But I don't know other place in github. At least, it is public :)

juanmf commented 11 years ago

https://github.com/juanmf/sfPlugins/issues/2 :)

juanmf commented 11 years ago

I found the reason! Word uses some security restrictions that prevents bullet simbols to be translated to unicode, in auxiliary.xsl line 476 and 494 commenting them out shows it, since the stylesheet works with Word also.. Don't know yet how to enable it in Word. Its the document('') XSL function.

<xsl:apply-templates select="document('')//my:recoding-table[@font=$font-family]/my:char[@value=$symbol or @altvalue=$symbol]"/>
...
<xsl:value-of select="substring(document('')//my:recoding-table[@font=$font-family]/my:char[substring(@code, 1, 1)=substring($string, 1, 1)]/@entity, 1, 1)"/>

Edit: This happens since MSXML 6.0 maybe with an older version of Word it'll do the job.

thicolares commented 11 years ago

Oops, closed accidentally

juanmf commented 11 years ago

I asked for help here and someone gave me the clue: http://w3schools.invisionzone.com/index.php?showtopic=46319

It is working, but with the auxiliary.xsl modified, which is wrong as I shouldn't touch it. I'll make the propper changes so I don't need to alter Word2FO libs.

Must prevent the templates that containf document() from being called, leave them here for the record:

juanmf@juanmf-PC /c/Word2FO
$ grep -Rn "ConvertString" ./stylesheets/
./stylesheets/auxiliary.xsl:486:  <xsl:template name="ConvertString">
./stylesheets/auxiliary.xsl:498:      <xsl:call-template name="ConvertString">
./stylesheets/elementStructure.xsl:1300:                <xsl:call-template name="ConvertString">
./stylesheets/elementStructure.xsl:1350:                <!--<xsl:call-template name="ConvertString">

juanmf@juanmf-PC /c/Word2FO
$ grep -Rn "ConvertSymbol" ./stylesheets/
./stylesheets/auxiliary.xsl:474:  <xsl:template name="ConvertSymbol">
./stylesheets/elementStructure.xsl:601:        <xsl:call-template name="ConvertSymbol">

juanmf@juanmf-PC /c/Word2FO
$ grep -Rn "ConvertChars" ./stylesheets/
./stylesheets/auxiliary.xsl:510:  <xsl:template name="ConvertChars">
juanmf commented 11 years ago

Ok,

Just commited some changes to Word2XSLTRenderingFO.xsl. Bullets should work now. You'll need to play around with margins though, as start-indent is used fine by the bullet, but not so well by text. If it brings issues, lets open another bug in neatReports repo.

Thanks!

juanmf commented 11 years ago

list-block

This is the way to handle, margins for list items. I'll add this to the HowTo soon... Also I added some changes, not commites yet, so on the 1st level of items, you can avoid that, but for more levels, it fails, and you must set the text tabulations.