Open GoogleCodeExporter opened 9 years ago
Hi hrisnew,
To be honnest with you, we have never done some study about memory consumption
(and me I have never played with some tools like JMeter).
It seems that memory consumption grows whit conversion. It shoud be interesting
to see of the problem is the same when docx is converted diretcly to pdf (see
https://code.google.com/p/xdocreport/wiki/XWPFConverterPDFViaIText)
When report is converted to pdf by using converter, the process is :
1) generate docx report and tores it in byte array
2) use the byt array and call the converter.
At this step memory could grows because we stores the generated docx report in
byt array.
It seems that you use XWPDF converter, the process for that it's :
1) load docx stream in a XWPFDocument with POI
2) loop for each structure of Apache POI (XWPFPararaph, etc) to generate iText
structure.
So it should be interesting if it's 1) or 2) which consumes memory.
If you could help us with the "XDocReport memory consumption" topic it should
be very cool.
Many thank's
Regards Angelo
Original comment by angelo.z...@gmail.com
on 28 Oct 2013 at 4:39
Bonjour Angelo,
Je suppose que vous êtes francophone et me permet donc de poster ce message en
français pour plus de clarté.
Effectivement la manière dont nous avons procédé au départ consiste à
faire ce qui suit (comme illustré dans l'extrait de code du commentaire de
départ):
1. Nous faisons un load du template cible dans le Registry et obtenons un
IDoxReport en sortie (pour optimiser nous essayons également de ne pas
reloader le même template mais plutôt de voir si le rapport correspondant au
chemin qui nous intéresse n'a pas déjà été caché).
2. Nous settons les options nécessaires ainsi que toutes les données du
modèle java sur le contexte (ceci est évidemment incontournable étant donné
que nous utilisons XDocReport pour générer des rapports et non pour convertir
des documents existants);
3. Nous procédons à la conversion du modèle en PDF et envoyons le résultat
dans un fichier sur disque;
Suite à votre proposition, j'ai adapté la suite des actions comme suit: au
lieu de convertir au point 3, je stocke d'abord le rapport en .docx, et
seulement après effectue une conversion directe du .docx en .pdf comme
suggeré.
Résultat des courses: le processing en .docx prend plus ou moins deux fois
moins de mémoire (la bosse dans le monitoring de VisualVM ne s'étend plus que
sur +/- 25 Mb ce qui est déjà mieux mais tout de même assez élevé). La
conversion qui suit par contre fait de nouveau exploser la mémoire: + 25/30
Mb.
A première vue, découper le processus en deux semble donc améliorer la
situation mais le problème de base reste: pour des fichiers de taille plus ou
moins conséquence (c'est d'un pdf de 220 Kb qu'il s'agit dans notre cas), la
conversion prend beaucoup trop de mémoire.
N'y a-t-il pas moyen de convertir l'xml (.xdoc) en pdf sans que tout le
nécessaire ne soit gardé en mémoire?
Original comment by hris...@gmail.com
on 29 Oct 2013 at 2:05
Hi,
We are frensh, but we prefer speaking english in order to many people can
follow topics about XDocReport.
If I understand you have improved the memory by generating the docx report in a
temporary file (instead of in a byte array) and after you convert it to pdf.
Perhaps it could be interesting to add this strategy in the Options converter?
After that it seems that you find our docx->pdf converter based on Poi+iText
uses too memory.
docx->pdf converter is a very hard task. I had written an article about
docx->pdf converter at
http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-wit
h-java/ to know other Java docx->pdf converter (docx4j and JODConverter).
You tell me if it's possible to convert directly the xml entries to pdf. I
think the best performance should be use a SAX parser and generates pdf. But
ooxml is very complex format, so we have decided to use a DOM like to load the
docx (we use Apache POI). We could do the same think with docx4j, but no time
to develop that.
An interesting test is to find where memory is used (is just to load docx with
Apache POI XWPFDocument takes memory?)
Original comment by angelo.z...@gmail.com
on 29 Oct 2013 at 10:37
Unfortunately we cannot give you any further hints as of where the observed
memory burst comes from other than once again pointing at the convert() method.
The only additional piece of information at my disposal is that the kind of
documents generated seems to also matter. We, for instance, face this problem
for reports containing tables with a list of hundreds (not thousands) of rows.
The resulting pdfs aren't very large by the way, only a few hundreds Kbs.
I do not have a clear idea of what converting docx to pdfs represents in terms
of programming but now that you mention DOM parsers, it confirms my suspicions
about the document to convert taking way more space in memory than reasonable.
How hard is it to port your implementation to a SAX/StAX-based implementation
anyway?
Original comment by hris...@gmail.com
on 30 Oct 2013 at 3:09
When I say "DOM-like" it's not a real DOM w3c Document, it's POI XWPFDocument
that we use. Developping docx->pdf is very very hard and we have taking a lot
of time to do that.
I have not the courage to restart from scratch our converter with SAX, but if
you wish to do that, I will be happy to help you.
Regards Angelo
Original comment by angelo.z...@gmail.com
on 30 Oct 2013 at 3:17
Hi,
For your information, I have started a new docx->pdf converter which uses only
OpenFormatsXML structures and not XWPF POI structure (which uses OpenFormatsXML
structures).
I think memory will be improved because I can use directly the generated report
(without creating a byte array) and after I don't use XWPF structures which
loads the whole XML entries of the docx. I my case I load just the needed XML
entries.
As I have started it, the converter looses a lot information when it is
converted into pdf. I must manage table and after it should be cool if you can
check if it improves the memory.
To test it, you must use the 1.0.4-SNAPSHOT and use
ConverterTypeVia.OpenXMLFormats instead of ConverterTypeVia.XWPF with
report.convert.
Original comment by angelo.z...@gmail.com
on 1 Nov 2013 at 9:00
Hi hrisnew,
have you tested with ConverterTypeVia.OpenXMLFormats converter option?
I have seen too that you do every time :
------------------------------------------------------------------------
IXDocReport report = XDocReportRegistry.getRegistry().loadReport(in,
TemplateEngineKind.Velocity);
------------------------------------------------------------------------
You must do load the report one time and after you retrieve it from the
registry. See at
https://code.google.com/p/xdocreport/wiki/DocxReportingJavaMain#5._Test_Performa
nce for a sample.
If you XDocReport servlet support, it manages that.
Regards Angelo
Original comment by angelo.z...@gmail.com
on 7 Nov 2013 at 1:15
Original issue reported on code.google.com by
hris...@gmail.com
on 28 Oct 2013 at 9:09