Closed dinosaure closed 3 years ago
I patched trunk with this and got:
File "pdfflate.ml", line 8, characters 40-47:
Error: The function applied to this argument has type
?wbits:int ->
proof:'weak5 Decompress.B.t ->
('weak6, 'weak5) Decompress.Zlib_deflate.t
This argument cannot be applied with label ~witness
This is with 'decompress' from OPAM.
Ah yes, I use the last version of decompress
which will be available soon. I will ping this PR when it will be available on OPAM.
@dinosaure I think it is landed in opam-repository already.
@johnwhitington Can you please reevaluate this PR? Should be fine now.
Can you give an indication of your code's speed when compared to miniz (or zlib?). If it's much slower, I'm unlikely to merge this.
(HIstorical note: all of camlpdf's encryption used to be written in OCaml, many years ago, but I could never get it even close to C, so I substituted the C code you see there now)
I can run tests on my regression suite to make sure your code can cope with all the streams in my example PDFs anyway. Perhaps that might help you by finding any remaining bugs in your code.
Slight off-topic, but from your knowledge of zip, do you have any thoughts on these two comments:
https://github.com/johnwhitington/camlpdf/issues/19#issuecomment-509206682
https://github.com/johnwhitington/camlpdf/issues/19#issuecomment-511936038
?
< decompress.1.0.0
is ten times slower than zlib
or miniz
and decompress.1.0.0
is three times slower than zlib
or miniz
(I talked about only decompression). You should be interested by this article about how we optimized decompress
.
However, it's difficult (impossible?) to do something faster than C.
In other side, decompress
take the opportunity of runtime/the bound checking and avoid a segmentation fault (at worst, an exception appear).
This PR should be updated because of the API changes of decompress library. Though from my experience it's way faster now.
Updated with the last version of decompress.1.1.0
.
I will have a look at benchmarking this with cpdf's test suite for speed and correctness in the next couple of weeks: let's get some numbers...
I have successfully built camlpdf from your sources. However, when I try to build cpdf to run my tests, I get:
ocamlfind ocamlopt \
-package camlpdf,decompress.zl -linkpkg \
cpdftime.o -g -o cpdf \
tjutil.cmx tjutf16.cmx tjllist.cmx tjparserMonad.cmx tjjson.cmx xmlm.cmx cpdfwriteJSON.cmx cpdfstrftime.cmx cpdfcoord.cmx cpdf.cmx cpdfcommand.cmx cpdfcommandrun.cmx
File "_none_", line 1:
Error: No implementations provided for the following modules:
Zl referenced from /Users/john/.opam/default/lib/camlpdf/camlpdf.cmxa(Pdfflate)
De referenced from /Users/john/.opam/default/lib/camlpdf/camlpdf.cmxa(Pdfflate)
make[1]: *** [cpdf] Error 2
make: *** [native-code] Error 2
The only makefile change I made was to add decompress.zl
to the Makefile:
# Build the cpdf command line tools and top level
MODS = tjutil tjutf16 tjllist tjparserMonad tjjson \
xmlm \
cpdfwriteJSON cpdfstrftime cpdfcoord cpdf cpdfcommand
SOURCES = cpdftime.c $(foreach x,$(MODS),$(x).ml $(x).mli) cpdfcommandrun.ml
RESULT = cpdf
ANNOTATE = true
PACKS = camlpdf decompress.zl
OCAMLFLAGS = -bin-annot
OCAMLNCFLAGS = -g -safe-string -w -3
OCAMLBCFLAGS = -g -safe-string -w -3
OCAMLLDFLAGS = -g
all : native-code native-code-library byte-code-library top htdoc
clean ::
rm -rf doc foo foo2 out.pdf out2.pdf *.cmt *.cmti
LIBINSTALL_FILES = cpdf.a cpdf.cma cpdf.cmxa \
$(foreach x,$(MODS),$x.mli) $(foreach x,$(MODS),$x.cmi) \
$(foreach x,$(MODS),$x.cmx) $(foreach x,$(MODS),$x.cmti)
install : libinstall
-include OCamlMakefile
I'm not familiar with OMakefile
but I think the error is about the META
file on the source where
requires: "bigarray decompress"
should be replaced by:
requires: "bigarray decompress.zl"
However, if you have a repository where I can reproduce your error, it will help me to understand the issue. I updated the PR according this update.
When I do that, camlpdf won't compile:
ocamlfind ocamlc -package bigarray,decompress -c -bin-annot -g -safe-string -w -3 pdfflate.ml
File "pdfflate.ml", line 14, characters 10-24:
14 | let w = De.make_window ~bits:15 in
^^^^^^^^^^^^^^
Error: Unbound module De
Cpdf seems to build fine until the link step.
Here is the repository for cpdf:
https://github.com/johnwhitington/cpdf-source
There's one line referring to Pdfcode.flate_level in cpdf.ml which you should comment out.
Currently, with my last patch (1bf0880) camlpdf
compiles. Then, I use opam
to pin to this specific version opam pin add camlpdf .
and with a minor changes of cpdf-source
(https://paste.isomorphis.me/R8N), all compile.
I think you mis-understoud my command where only the META
file provided by camlpdf
must be change (to require decompres.zl
instead decompress
).
Thanks. CPDF test suite runs in 94 minutes 30 seconds with OCaml decompress, compared with 43 minutes 28 seconds with camlzip/miniz.c.
There are some differences between which malformed or difficult streams your code and camlzip can cope with. I will analyse that soon, and update this ticket. It may be that your code is useful to CamlPDF as a fallback mechanism if Camlzip cannot decode a stream.
Thanks. CPDF test suite runs in 94 minutes 30 seconds with OCaml decompress, compared with 43 minutes 28 seconds with camlzip/miniz.c.
It's better than:
< decompress.1.0.0 is ten times slower than zlib or miniz and decompress.1.0.0 is three times slower than zlib or miniz
Thanks for your feedbacks :+1: !
Here are some failures with decompress -- things which work with camlzip / miniz:
================================================================================
Testing file PDFTests/test.pdf
================================================================================
cpdf -control control.txt -recrypt "PDFTests/test.pdf" -o "PDFResults/control/test.pdf"
cpdf -control control.txt -recrypt "PDFTests/test.pdf" -o "PDFResults/control/test.pdf"
inline image reading failed: End_of_file
Failed to understand content on page
cpdf encountered an error. Technical details follow:
Malformed page content
================================================================================
Testing file PDFTests/badnameddestination.pdf
================================================================================
cpdf -scale-to-fit a4portrait -scale-to-fit-scale 0.9 -recrypt "PDFTests/badnameddestination.pdf" -o "PDFResults/scaletofit/badnameddestination.pdf"
cpdf -scale-to-fit a4portrait -scale-to-fit-scale 0.9 -recrypt "PDFTests/badnameddestination.pdf" -o "PDFResults/scaletofit/badnameddestination.pdf"
cpdf encountered an unexpected error. Technical Details follow:
De.Queue.Full
================================================================================
Testing file PDFTests/main128fail.pdf
================================================================================
cpdf -list-bookmarks -utf8 -recrypt "PDFTests/main128fail.pdf" -o "PDFResults/listbookmarks/main128fail.pdf"
cpdf -list-bookmarks -utf8 -recrypt "PDFTests/main128fail.pdf" -o "PDFResults/listbookmarks/main128fail.pdf"
cpdf encountered an unexpected error. Technical Details follow:
Invalid_argument("bits MUST be between 8 and 15 (17)")
The first one is more worrying, since it means that decompression appears to succeed, but the result is wrong.
The second and third things look like they should be returned as proper decompress errors...
Thanks, I don't have time yet to look into it but it's very helpful for me to fix correctly some details about decompress
:+1:.
cc @Drup