Closed ViliusSutkus89 closed 1 month ago
I'm gonna go with the conclusion that conan with msvc-1939 is broken:
freetype/2.13.2: Meson configure cmd: meson setup --native-file "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\build-release\conan\conan_meson_native.ini" "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\build-release" "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\src" --prefix=/
freetype/2.13.2: RUN: meson setup --native-file "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\build-release\conan\conan_meson_native.ini" "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\build-release" "C:\Users\runneradmin\.conan2\p\b\freet6a5c8a94fbf5c\b\src" --prefix=/
conanvcvars.bat: Activating environment Visual Studio 17 - amd64 - winsdk_version=None - vcvars_ver=14.3
[ERROR:vcvars.bat] Toolset directory for version '14.3' was not found.
[ERROR:VsDevCmd.bat] *** VsDevCmd.bat encountered errors. Environment may be incomplete and/or incorrect. ***
[ERROR:VsDevCmd.bat] In an uninitialized command prompt, please 'set VSCMD_DEBUG=[value]' and then re-run
[ERROR:VsDevCmd.bat] vsdevcmd.bat [args] for additional details.
[ERROR:VsDevCmd.bat] Where [value] is:
[ERROR:VsDevCmd.bat] 1 : basic debug logging
[ERROR:VsDevCmd.bat] 2 : detailed debug logging
[ERROR:VsDevCmd.bat] 3 : trace level logging. Redirection of output to a file when using this level is recommended.
[ERROR:VsDevCmd.bat] Example: set VSCMD_DEBUG=3
[ERROR:VsDevCmd.bat] vsdevcmd.bat > vsdevcmd.trace.txt 2>&1
Do we need msvc-1939 support for odr.core? Freetype is not our recipe, so this may eventually be solved without our intervention. We could set up a nice test case and report the issue to conan though
Fontforge fails to compile with msvc-19.40. I've checked Fontforge and they do actually ship Windows binaries, although I'm unsure if they compile with msvc. Not sure if this is worth pursuing, conan-odr-index should be the first place where we add extra compilers or windows support. Disabling msvc compiler here.
We don't really use the msvc build but I like to keep it because it picks on stuff the other compilers don't. Generally I think pdf2htmlex
should be a feature we can toggle at compile time so I would just turn it off for windows.
Brought back the msvc compilers, 19.39 and 19.40, just for good measure.
Anyway, I would need some help integrating on the c++ side. With the current code on this PR, #include <pdf2htmlEX.h>
works.
I assume the integration should happen somewhere in src/odr/html.cpp:
Html html::translate(const PdfFile &pdf_file, const std::string &output_path,
const HtmlConfig &config) {
fs::create_directories(output_path);
#if WITH_PDF2HTMLEX
return pdf2htmlex_wrapper_which_returns_html_object(pdf_file, output_path, config);
#else
return internal::html::translate_pdf_file(pdf_file, output_path, config);
#endif
}
Any other ideas?
I think this makes a lot of sense yes. I don't think we need this runtime switchable. Even if we can change this in the future.
Resource files are ugly, I gather them in odr.droid/app/CMakeLists.txt and then tell gradle that asset merging has a dependency on CMake - https://github.com/opendocument-app/OpenDocument.droid/blob/d26811deef04dbba6d70b97c01967ec72557d641/app/build.gradle#L120 . I doubt it's possible to make the gathering somehow nicer.
Runtime side could be nicer though. Currently we extract them right before calling pdf2htmlEX. Would be possible to move that to app init, but it would make app init extra slow, so I don't really want to go that route.
Proper implementation would be to read assets in C++ directly from APK, using AssetManager, but I can't recall right now what was wrong with that route. Maybe it requires a high min API level. It definitely requires having access to the JNI Env object and AssetManager, obtained from Android's Context. These objects are bound to the specific Java thread, which does JNI, pdf2htmlEX doesn't do threading, I'm unsure about poppler. Anyway, this implementation requires some code.
This is more or less usable. APK available at https://github.com/opendocument-app/OpenDocument.droid/actions/runs/10419751442
We still need to wire encrypted pdf's, no idea how to do that. Will need @andiwand to look at it.
Btw, it would be possible to keep both pdf converters, pdf2htmlEX and the one at odr.core, but with different entry points in open_document_reader.hpp
. This way would we could have fallback mode, if pdf2htmlEX fails
Keeping both would be nice. I think the odr.core internal one is not very usable yet but having a runtime switch would be nice. I don't think we have a mechanism for that yet because it would be the same file format and two different open and translate mechanisms. I need to think a bit on how to best model that.
Encrypted files are also supported by the core. We currently handle that in the DecodedFile
class.
@ViliusSutkus89 there seem to be some compile errors in the index repository https://github.com/opendocument-app/conan-odr-index/actions/runs/10423033599
That previous compile error was because of some borked package that wasn't rebuilt properly. Current compile error was because I've messed it up when switching profiles RelWithDebInfo->Release. Sorry for merging straight to main, I thought it would just work the first time, lol :D
We have a working version with both pdf2htmlEX and wvWare compiled as conan deps into libodr-core.so. Latest build available on this actions run in odr.droid.
Check out the space savings from de-duplication. With conan deps:
$ du -hs lib/*
23M lib/arm64-v8a
18M lib/armeabi-v7a
23M lib/x86
24M lib/x86_64
With regular gradle deps:
$ du -hs lib/*
33M lib/arm64-v8a
26M lib/armeabi-v7a
32M lib/x86
34M lib/x86_64
odr.droid#350 mentioned size more than doubling. Release apk of odr.droid-3.19 was 56M, odr.droid-3.26 is 125M. Current release apk is 98M.
I'm not yet completely sure why it's so big. In odr.droid-3.19 extracted lib folder was 109M, assets were 21M, Resources were 7.3M+1.1M, classes were 2M. In current conan build extracted lib folder is 87M, assets are 21M, Resources are 2.7M+1.1M, classes are 8.2M.
The total size was supposed to go down even smaller than that of 3.19, unless my numbers are wrong, because I'm comparing f-droid release to my build, when f-droid release could have been passed through some better compressor. Will try to investigate further. Possibly the end result will be less than 3.19's 56M.
@andiwand , I need your opinion on the current interface between DocLoader/PdfLoader and C++.
Correction, properly working APK is in this actions run - https://github.com/opendocument-app/OpenDocument.droid/actions/runs/10549275544
src/wvWare.c could be placed somewhere else
Hmmm... these tests are passing on my machine. I wonder why CI is throwing up
2024-08-28T23:40:26.5479540Z [ RUN ] all_test_files/pdf2htmlEXWrapperTests.html/odr_private_pdf_onepage_pdf
2024-08-28T23:40:26.5481694Z /home/runner/work/OpenDocument.core/OpenDocument.core/test/data/input/odr-private/pdf/onepage.pdf to output/odr-private/output/pdf2htmlEX/pdf/onepage.pdf
2024-08-28T23:40:26.5506970Z Preprocessing: 0/1
2024-08-28T23:40:26.5507446Z Preprocessing: 1/1
2024-08-28T23:40:27.2307481Z Working: 0/1
2024-08-28T23:40:27.2307952Z Working: 1/1
2024-08-28T23:40:27.2318653Z unknown file: Failure
2024-08-28T23:40:27.2319323Z Unknown C++ exception thrown in the test body.
2024-08-28T23:40:27.2319822Z
2024-08-28T23:40:27.2320824Z [ FAILED ] all_test_files/pdf2htmlEXWrapperTests.html/odr_private_pdf_onepage_pdf, where GetParam() = "odr-private/pdf/onepage.pdf" (684 ms)
I've finally figured out why the tests were failing, turns out I've forgot to feed test VMs with ~/.conan2/p, which contains assets. Reference outputs are now generated and committed into appropriate repos. Naturally, I've had to run into another problem.
Why is actions/checkout getting old submodule revisions? Do I need to manually update git revisions somewhere?
Checkout with submodules should just work, right?
- uses: actions/checkout@v4
with:
submodules: true
.gitmodules does not list any revisions:
[submodule "test/data/input/odr-public"]
path = test/data/input/odr-public
url = https://github.com/opendocument-app/OpenDocument.test.git
[submodule "test/data/reference-output/odr-public"]
path = test/data/reference-output/odr-public
url = https://github.com/opendocument-app/OpenDocument.test.output.git
[submodule "test/data/input/odr-private"]
path = test/data/input/odr-private
url = git@github.com:opendocument-app/OpenDocument.test-private.git
[submodule "test/data/reference-output/odr-private"]
path = test/data/reference-output/odr-private
url = git@github.com:opendocument-app/OpenDocument.test-private.output.git
From https://github.com/opendocument-app/OpenDocument.test-private
* branch 508.... -> FETCH_HEAD
Submodule path 'test/data/input/odr-private': checked out '508...'
508.... is the third most recent commit in that repo. Commited on Dec 30, 2021. There are two more recents commits, made in summer of 2022.
opendocument-app/OpenDocument.test checks the correct revision, c2cc81ba91b6145ff51801644169f4f01878556b
is the HEAD
Submodule path 'test/data/input/odr-public': checked out 'c2cc81ba91b6145ff51801644169f4f01878556b'
From https://github.com/opendocument-app/OpenDocument.test-private.output
* branch 1b5... -> FETCH_HEAD
Submodule path 'test/data/reference-output/odr-private': checked out '1b5...'
1b5...
is second most recent commit, checkout skipped my recently added commit.
From https://github.com/opendocument-app/OpenDocument.test.output
* branch 6138deea822cc17e940181fe3d99b6b6aef64551 -> FETCH_HEAD
Submodule path 'test/data/reference-output/odr-public': checked out '6138deea822cc17e940181fe3d99b6b6aef64551'
6138deea822cc17e940181fe3d99b6b6aef64551
is third most recent commit.
I think for now it would be fine to smoke test the two libraries. Just picking one input file for each of them and translate it to a random directly checking that it does not fail. For the ref comparison I would try to integrate it into the existing translation mechanism first
We are way past the smoke test now. Ref testing is already implemented. It's just that a fresh git clone with recursive submodules clones wrong reference outputs and that's why the CI fails.
Just checked and realized that a local git clone also has this behavior, so it's not GitHub actions issue, more of a git issue
Turns out I have to update git submodules in the root project too
Another issue:
├── pdf2htmlEX
│ ├── pdf
│ │ ├── empty.pdf
│ │ │ ├── document.html ✓
│ │ ├── style-various-1.pdf
Traceback (most recent call last):
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 233, in <module>
sys.exit(main())
^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 223, in main
result = print_results(results, args.a, args.b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 184, in print_results
sub_result = print_results(sub_results,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 184, in print_results
sub_result = print_results(sub_results,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 184, in print_results
sub_result = print_results(sub_results,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 160, in print_results
cmp = future.result()
^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 85, in compare
return compare_files(path_a,
^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 49, in compare_files
return compare_html(a, b, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/compare_output.py", line 33, in compare_html
diff, (image_a, image_b) = html_render_diff(a, b, browser=browser)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/html_render_diff.py", line 43, in html_render_diff
image_a = screenshot(browser, to_url(a))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/OpenDocument.core/OpenDocument.core/test/scripts/html_render_diff.py", line 23, in screenshot
png = body.screenshot_as_png
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/site-packages/selenium/webdriver/remote/webelement.py", line 325, in screenshot_as_png
return b64decode(self.screenshot_as_base64.encode("ascii"))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/site-packages/selenium/webdriver/remote/webelement.py", line 314, in screenshot_as_base64
return self._execute(Command.ELEMENT_SCREENSHOT)["value"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/site-packages/selenium/webdriver/remote/webelement.py", line 394, in _execute
return self._parent.execute(command, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 347, in execute
self.error_handler.check_response(response)
File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://remote/content/shared/Capture.sys.mjs :: capture.canvas :: line 137" data: no]
Stacktrace:
capture.canvas@chrome://remote/content/shared/Capture.sys.mjs:137:62
takeScreenshot@chrome://remote/content/marionette/actors/MarionetteCommandsParent.sys.mjs:292:37
Still no idea what that means
Not sure if mixing the test file output between integrated and not integrated parts is a good idea. I see that adding the new API is the easiest solution for now but the new tests couple quite tightly to the new api and the existing test repos.
I could put pdf2htmlEX/wvWare reference outputs in
test/data/reference-output/odr-public/output-pdf2htmlEX
instead of
test/data/reference-output/odr-public/output/pdf2htmlEX
and test outputs in
build/test/output/odr-public/output-pdf2htmlEX
instead of
build/test/output/odr-public/output/pdf2htmlEX
.
Let's get this in! I will try to follow up on some of my suggestions here https://github.com/opendocument-app/OpenDocument.core/pull/387 and will revisit the screenshotting there as well
WIP. Currently just linking. Need to add actual usage