Closed CetinSert closed 3 years ago
Another one where it is even more noticeable: https://github.com/mozilla/pdf.js/files/5722039/pdfjsError.pdf
expected
vs pdfium.wasm
Hi,
Im trying to solve it too.
I will update the library to check if it was fixed.
I open a discussion about it here: https://groups.google.com/g/pdfium/c/xqMSoBa6ZVU
Thanks.
Hey there still working on this?
Hi,
Yes, but without a solution yet :(
Well, I would love to join, but currently that's not possible as I don't have enough time. I hope and think that this is going to change towards Q3/Q4 this year.
In the meantime, I thought that this might help: https://github.com/coolwanglu/PDFium.js/
That's another project (quite old though) which ported PDFium to WASM, and it seems that the troubling images are working there. So maybe you can spot something that gives you a hint want might be missing...
p.s. building his project with the current sources will not work.
Nice, i check it.
Hi,
After read that repository and mine, it use a very old source code that we can see what is changed and the owner don't answer my tweet to make a contact.
I have updated the "pdfium-lib" repository and now it has the last version of pdfium with all updated patches and the template now include pdfium branch and commit.
Check here: https://pdfviewer.github.io/
But it still with images problem.
Any help is welcome!
@paulo-coutinho does this affect only the WASM build?
Hi,
Apparently yes. I run pdfium_test executable and it worked as expected as google guy suggested as you can see here: https://groups.google.com/g/pdfium/c/xqMSoBa6ZVU
I only need understand why it happen and i will create the patchs.
If anyone can try help me we can solve this.
Thanks.
I‘ll give it a try tonight,...
I'm currently testing around with your pre-built binaries, as I cannot seem to make things work compiling from scratch: .../test/pdfium-lib/build/linux/x64/release/lib/libpdfium.a: Unknown format, not a static library!
That basically happens when I try to build test and generate the final wASM (steps 8 and 9 in the WASM compile guide). Any idea why that might be the case?
What branch do you testing?
Use 4466 branch.
Currently the master branch. I will switch then... Can you explain why you are referring to build options like ‚USE_JPEG=1‘ whereas I cannot seemto fund these emsdk options in the repo I pointed out above?
Hi,
The flag USE_JPEG=1
mean to use emsdk JPEG port library instead of use the library from pdfium.
If you disable it you will get error:
emcc -MMD -MF obj/third_party/nasm/nasm/realpath.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_OZONE=1 -DUSE_X11=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -DCR_CLANG_REVISION=\"llvmorg-13-init-4720-g7bafe336-1\" -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DHAVE_CONFIG_H -I../.. -Igen -I../../third_party/nasm -I../../third_party/nasm/asm -I../../third_party/nasm/disasm -I../../third_party/nasm/include -I../../third_party/nasm/output -I../../third_party/nasm/x86 -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing --param=ssp-buffer-size=4 -fno-stack-protector -funwind-tables -fPIC -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -Xclang -fdebug-compilation-dir -Xclang . -no-canonical-prefixes -O2 -fdata-sections -ffunction-sections -fno-omit-frame-pointer -g0 -ftrivial-auto-var-init=pattern -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Werror -Wall -Wno-unused-variable -Wno-misleading-indentation -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-unneeded-internal-declaration -Wno-undefined-var-template -Wno-psabi -Wno-deprecated-register -Wno-implicit-int-float-conversion -Wno-final-dtor-non-final-class -Wno-builtin-assume-aligned-alignment -Wno-deprecated-copy -Wno-non-c-typedef-for-linkage -Wmax-tokens -Wno-unused-function -Wno-string-conversion -Wno-macro-redefined -Wno-sign-compare -Wno-nonnull -Wno-uninitialized -std=c11 -Wno-implicit-fallthrough -c ../../third_party/nasm/nasmlib/realpath.c -o obj/third_party/nasm/nasm/realpath.o
../../third_party/nasm/nasmlib/realpath.c:58:16: error: implicit declaration of function 'canonicalize_file_name' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
char *rp = canonicalize_file_name(rel_path);
^
../../third_party/nasm/nasmlib/realpath.c:58:11: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
char *rp = canonicalize_file_name(rel_path);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
Thanks.
Yes, I know, I just wondered how the repo above manages to compile without that flag.
Well, never mind, I will need some time to investigate, but it seems that handling embedded images in general poses a problem with the current build.
@inzanez hello, have you been able to make any progress on this?
@cetinsert I'm afraid not. I managed to build everything and tried different build options, but it won't change anything so far. I guess there's no way around digging deeper in the PDFium codebase. I found something that might help in finding the culprit: the WASM build is not handling images at all. It might be that this is somehow related to binding to libJpeg etc. in WASM. I managed to crash the online PDF Viewer and that might lead to the issue:
pdfcpu
: pdfcpu import Test.pdf my_image.png
)I will definitely continue looking into this once I have more time, but that will take until around August I'm afraid.
Hi,
Can you attach the PDF that crash here?
Maybe it help test and/or find some solution.
Thanks.
@paulo-coutinho After some more testing it seems that it's very browser-specific, so it doesn't seem to be the wasm
that fails, but the browser with the empty output of the wasm
module.
I can confirm that I don't get any errors from the wasm
module.
Hi,
But do you know how to solve it?
I tried a lot of things, but without success.
Thanks.
@paulo-coutinho Are codecs for embedded images not part of PDFium proper? Do these get built from other projects and linked to PDFium?
You can see better here: https://github.com/lukas-w/pdfium
It use "libjpeg_turbo": https://github.com/lukas-w/pdfium/blob/f8d930b68fdd3e9d20434c5a1788205e9ba0e695/DEPS#L88
@paulo-coutinho If I turn on debugging for WASM in Chromium and instruct it to pause on exceptions, I register several 'longjmp' exceptions after FPDF_RenderPageBitmap is called. I just tried to rebuild the project so that I could include debug information (https://emscripten.org/docs/porting/Debugging.html), but it seems that I cannot build the project anymore. I don't really know where and why it fails, I just get a lot of error messages. Maybe you can put a build online with debug information included? I hope that we can find the function failing that way...
Hi,
Sure, i will do it.
Thanks.
Hi,
It is all updated and you can download WASM debug and release here: https://github.com/paulo-coutinho/pdfium-lib/releases/tag/4505
Obs: Only WASM has it for your tests.
It is also published here: https://pdfviewer.github.io/
Thanks for any help to solve it.
@paulo-coutinho Many thanks for that build, that's quite interesting. Please apologize for the delay, I still have too much work to attend to. I ran the debug build in the browser today, and as I thought it seems to be related to the Jpeg decoding:
I haven't had the time to dig into that yet but wanted to share this, maybe it helps someone else in the meantime. I will still continue to work on this whenever I have time.
@paulo-coutinho Dear Paulo, I finally managed. Please don't ask me why it is that way, as I cannot answer that. I just followed a feeling I had,...I can't put the picture together, but I do have a running version that seems to render the 'faulty' PDFs just fine, without producing errors.
I checked all the distros I used to build the thing so far, and it seems all of them are using libjpeg-turbo as a default. So I created a simple docker container based on debian:latest
running docker run -it debian:latest /bin/bash
and installed all the things required to build PDFium. Then, I downloaded libjpeg
from here:
libjpeg, built it with --prefix=/usr
and installed it. I then had to copy some include files from /usr/include
to /usr/include/x86_...
so that the patch script worked, built the thing, and it worked.
I just cross-checked with another build with the same emsdk
version and your latest repo but with libjpeg-turbo
as the system jpeg-library, and I can confirm it fails again.
So it seems that you should not build it on a system using libjpeg-turbo
as the default jpeg library. I would have thought that as they are interchangable that this would not matter at all, as emscripten
brings its own jpeg library, so I am still a bit confused.
Maybe you can try and verify, on the other hand I could put together a Dockerfile for the build if you want me to.
Hi,
Sure, im trying fixing with your tips. Im making a docker image for wasm with all included and will test it to upload here.
Thanks.
I tried your steps and i get this error message when try compile:
COMMAND:
root@1cff8106b674:/app/build/linux/x64/gen/utils# em++ -g -o /app/build/linux/x64/gen/out/pdfium.html -s EXPORTED_FUNCTIONS="$(node function-names ../xml/index.xml)" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' custom.cpp @pdfium.rsp -std=c++11 -Wall --no-entry
ERROR
em++: error: '/emsdk/upstream/bin/wasm-emscripten-finalize --detect-features --minimize-wasm-changes -g --dyncalls-i64 --dwarf /app/build/linux/x64/gen/out/pdfium.wasm -o /app/build/linux/x64/gen/out/pdfium.wasm' failed (-9)
My new Dockerfile:
FROM ubuntu:18.04
# general
ARG DEBIAN_FRONTEND=noninteractive
ENV PROJ_TARGET="wasm"
ENV JAVA_VERSION="8"
ENV JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/"
# packages
RUN apt-get -y update
RUN apt-get install -y build-essential sudo file git wget curl cmake ninja-build zip unzip tar python3 python3-pip openjdk-${JAVA_VERSION}-jdk nano lsb-release libglib2.0-dev tzdata doxygen --no-install-recommends && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# define timezone
RUN echo "America/Sao_Paulo" > /etc/timezone
RUN dpkg-reconfigure -f noninteractive tzdata
RUN /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local
# java
ENV PATH=${PATH}:${JAVA_HOME}/bin
RUN echo ${JAVA_HOME}
RUN java -version
# google depot tools
RUN git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git /opt/depot-tools
ENV PATH=${PATH}:/opt/depot-tools
# pdfium - dependencies
RUN mkdir /build
WORKDIR /build
RUN gclient config --unmanaged https://pdfium.googlesource.com/pdfium.git
RUN gclient sync
WORKDIR /build/pdfium
RUN git checkout 72fd656fee19235d9445796edee1e2c0c1e5e395
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip
RUN apt-get install -o APT::Immediate-Configure=false -f apt \
&& apt-get -f install \
&& dpkg --configure -a \
&& apt-get -y dist-upgrade \
&& echo n | ./build/install-build-deps.sh \
&& rm -rf /build
# ninja
RUN ln -nsf /opt/depot-tools/ninja-linux64 /usr/bin/ninja
# dependencies
RUN pip3 install --upgrade pip
RUN pip3 install setuptools docopt python-slugify tqdm
# libjpeg
RUN mkdir /opt/libjpeg
WORKDIR /opt/libjpeg
RUN curl https://ijg.org/files/jpegsrc.v9c.tar.gz -o jpegsrc.v9c.tar.gz
RUN tar -xvf jpegsrc.v9c.tar.gz
WORKDIR /opt/libjpeg/jpeg-9c
RUN ./configure --prefix=/usr
RUN make && make install
# emsdk
RUN mkdir /emsdk
WORKDIR /emsdk
RUN git clone https://github.com/emscripten-core/emsdk.git .
RUN ./emsdk install 2.0.20
RUN ./emsdk activate 2.0.20
ENV PATH="${PATH}:/emsdk:/emsdk/upstream/emscripten"
# cache system libraries
RUN bash -c 'echo "int main() { return 0; }" > /tmp/main.cc'
RUN bash -c 'source /emsdk/emsdk_env.sh && em++ -s USE_ZLIB=1 -s USE_LIBJPEG=1 -s USE_PTHREADS=1 -s ASSERTIONS=1 -o /tmp/main.html /tmp/main.cc'
# nodejs and npm
RUN curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
RUN apt-get install -y nodejs
RUN npm install -g npm@latest
# working dir
WORKDIR /app
You make something different?
Let me put together a Dockerfile overcthe week end. I will make a PR.
Hi,
Will be nice.
I made a PR with what i do until now: https://github.com/paulo-coutinho/pdfium-lib/pull/30/files
I tried libjpeg manually, libjpeg-turbo manually, but nothing. Same error:
COMMAND:
docker run -v ${PWD}:/app -it pdfium-wasm python3 make.py run generate-wasm
ERROR:
> Compiling with emscripten...
em++: error: '/emsdk/upstream/bin/wasm-emscripten-finalize --minimize-wasm-changes -g --dyncalls-i64 --dwarf /app/build/linux/x64/gen/out/pdfium.wasm -o /app/build/linux/x64/gen/out/pdfium.wasm --detect-features' failed (-9)
And im using version "v9c" because libjpeg of emscripten if version 9c too, only to make "more compatible".
Thanks.
Verbose log (updated without rsp):
em++ -v -g -o /app/build/linux/x64/gen/out/pdfium.html -s EXPORTED_FUNCTIONS="$(node function-names ../xml/index.xml)" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' custom.cpp /app/build/linux/x64/debug/lib/libpdfium.a -I/app/build/linux/x64/debug/include -s DEMANGLE_SUPPORT=1 -s USE_ZLIB=1 -s USE_LIBJPEG=1 -s WASM=1 -s ASSERTIONS=1 -s ALLOW_MEMORY_GROWTH=1 -std=c++14 -Wall --no-entry
"/emsdk/upstream/bin/clang++" -target wasm32-unknown-emscripten -DEMSCRIPTEN -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -D__EMSCRIPTEN_major__=2 -D__EMSCRIPTEN_minor__=0 -D__EMSCRIPTEN_tiny__=20 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__ -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/SDL --sysroot=/emsdk/upstream/emscripten/cache/sysroot -Xclang -iwithsysroot/include/compat -v -g -I/app/build/linux/x64/debug/include -std=c++14 -Wall custom.cpp -c -o /tmp/emscripten_temp_t5gzousm/custom_0.o
clang version 13.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 642df18f1437b1fffea2343fa471aebfff128c6e)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /emsdk/upstream/bin
(in-process)
"/emsdk/upstream/bin/clang-13" -cc1 -triple wasm32-unknown-emscripten -emit-obj -mrelax-all --mrelax-relocations -disable-free -main-file-name custom.cpp -mrelocation-model static -mframe-pointer=none -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility hidden -debug-info-kind=limited -dwarf-version=4 -debugger-tuning=gdb -v -fcoverage-compilation-dir=/app/build/linux/x64/gen/utils -resource-dir /emsdk/upstream/lib/clang/13.0.0 -D EMSCRIPTEN -D __EMSCRIPTEN_major__=2 -D __EMSCRIPTEN_minor__=0 -D __EMSCRIPTEN_tiny__=20 -D _LIBCPP_ABI_VERSION=2 -D unix -D __unix -D __unix__ -I /app/build/linux/x64/debug/include -isysroot /emsdk/upstream/emscripten/cache/sysroot -internal-isystem /emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1 -internal-isystem /emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /emsdk/upstream/lib/clang/13.0.0/include -internal-isystem /emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /emsdk/upstream/emscripten/cache/sysroot/include -Werror=implicit-function-declaration -Wall -std=c++11 -fdeprecated-macro -fdebug-compilation-dir=/app/build/linux/x64/gen/utils -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -iwithsysroot/include/SDL -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_t5gzousm/custom_0.o -x c++ custom.cpp
clang -cc1 version 13.0.0 based upon LLVM 13.0.0git default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1"
ignoring nonexistent directory "/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
/app/build/linux/x64/debug/include
/emsdk/upstream/emscripten/cache/sysroot/include/SDL
/emsdk/upstream/emscripten/cache/sysroot/include/compat
/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
/emsdk/upstream/lib/clang/13.0.0/include
/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
"/emsdk/upstream/bin/wasm-ld" @/tmp/emscripten_4c6uswj1.rsp
"/emsdk/upstream/bin/wasm-emscripten-finalize" --minimize-wasm-changes -g --dyncalls-i64 --dwarf /app/build/linux/x64/gen/out/pdfium.wasm -o /app/build/linux/x64/gen/out/pdfium.wasm --detect-features
em++: error: '/emsdk/upstream/bin/wasm-emscripten-finalize --minimize-wasm-changes -g --dyncalls-i64 --dwarf /app/build/linux/x64/gen/out/pdfium.wasm -o /app/build/linux/x64/gen/out/pdfium.wasm --detect-features' failed (-9)
I had removed RSP file and put all direct by param, it is not necessary.
The most strange is that test command compile and work:
docker run -v ${PWD}:/app -it pdfium-wasm python3 make.py run test-wasm
python -m http.server --directory sample-wasm/build
Hi,
Finally make it work!!!!!
The changes are on master and publish as release 4505.
Can you check debug version there to see if it throw any error?
Great!!! I was just starting to work on a Dockerfile. So I can skip that now. So it really was libjpeg vs. turbo, right?
And no, there's no error anymore in the build above!
Yes. I installed the same version of emsdk libjpeg and modified wasm.py to copy only required files.
Hi @cetinsert,
It was fixed. Can you test and check your PDFs?
Thanks.
@paulo-coutinho - testing now!!
Awesome!
It just works!
Thank you everyone!
Very nice. Closed finally.
People, consider donate to help project.
Thanks guys for any help.
@cetinsert very thanks man!
I cant able to open the below PDF files https://pdfviewer.github.io/
File 1 : [WCP1 - 12-05-21 Permit Set - Architecture - T1.pdf]()
File 2 : [A2B1.pdf]()
I tested all PDF with my latest version and it is OK, you can test of my web app to check.
The only PDF that i can't download from github is from @KameshRajendran, maybe a github bug, because download don't happen.
The others i tested is OK.
If everything is ok, can you close the issue please?
Thanks.
Describe the bug pdfium.wasm fails to render some images.
To Reproduce https://d15k2d11r6t6rl.cloudfront.net/public/users/Integrators/a0a42ab5-3cb9-4912-84b7-3c6e47330d5c/smart-pr-805/Consumentenfolder_A5_zonder_paskruizen%20_2.pdf
Expected behavior
Screenshots