Open vadimkantorov opened 4 years ago
Okay, I managed to compile xetex (by simply skipping the packagedata step of icu)
It still complained about popen - is it popen for running only dfvipdfmx? Or for many other things as well?
Now a question is how to test it. Should I use some wasm virtual machine first?
Please feel free to take a look at the "working" version: https://github.com/vadimkantorov/xetex2020.js/
@lyze Could you please explain what is the assumed format of texlive.lst? I'm confused by ": > $@" in Makefile. Does it mean it wants to have colon as the first line?
My texlive root directory looks like this:
vadimkantorov@DESKTOP-4UF8FID:/mnt/c/Users/user/xetex2020.js$ ls texlive
LICENSE.CTAN README profile.input texmf-config texmf-local texmf.cnf
LICENSE.TL README.usergroups release-texlive.txt texmf-dist texmf-var texmfcnf.lua
How should lines in texlive.lst look like?
texmf-dist/...
? or
texlive/texmf-dist/...
: > $@
is a really abstruse way to always truncate a file. The :
is a bash builtin that always returns true, and $@
is the target.
texlive.lst
contains a list of all files in the texlive distribution, and it is only used in the example with some glue code to use emscripten FS lazy loading.
Thanks! Didn't know this bash builtin!
So for first localhost http test, I could try to directly preload 130Mb of basic texlive installation to a emscripten binary (e.g. with xetex)?
Do I understand correctly that cwd
thing is not a problem for the browser environment? As far as I can see, the lazy files go into /texlive
and not into /cwd
in https://github.com/lyze/xetex-js/blob/master/workercontroller.js?dfdf/qwewe#L322
Right, you could certainly try to do preload the entire installation in the binary if you wanted to. The wrapper code around the manifest file exists as an optimization to not have to download the entire distribution to load a webpage.
That's the right understanding too: the cwd
thing is solely to make things work when running xetex with node.
I made an update to my scripts repo. Now it's much better since I introduced a ccskip.py helper script, that alleviates the need for running make multiple times.
I've got a question about your final linking command:
xetex_web2c_dir = $(XETEX_BUILD_DIR)texk/web2c/
web2c_objs = $(addprefix $(xetex_web2c_dir), xetexdir/xetex-xetexextra.o synctexdir/xetex-synctex.o xetex-xetexini.o xetex-xetex0.o xetex-xetex-pool.o)
xetex_libs_dir = $(XETEX_BUILD_DIR)libs/
xetex_libs = $(addprefix $(xetex_libs_dir), harfbuzz/libharfbuzz.a graphite2/libgraphite2.a icu/icu-build/lib/libicuuc.a icu/icu-build/lib/libicudata.a teckit/libTECkit.a poppler/libpoppler.a libpng/libpng.a)
xetex_link = $(web2c_objs) $(LIB_FONTCONFIG) $(xetex_web2c_dir)libxetex.a $(xetex_libs) $(LIB_EXPAT) $(xetex_libs_dir)freetype2/libfreetype.a $(xetex_libs_dir)zlib/libz.a $(xetex_web2c_dir)lib/lib.a $(XETEX_BUILD_DIR)texk/kpathsea/.libs/libkpathsea.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc
em++ $(EM_LINK_FLAGS) $(EM_LINK_OPT_WORKAROUND_FLAGS) --pre-js xetex.pre.js -o $@ $(xetex_link) -s TOTAL_MEMORY=536870912 -s EXPORTED_RUNTIME_METHODS=[] -s ERROR_ON_UNDEFINED_SYMBOLS=0 -s WASM=0
It's a lot of objects and flags. When I dumped what make is trying to do I got this:
em++ -g -O2 -o xetex xetexdir/xetex-xetexextra.o synctexdir/xetex-synctex.o xetex-xetexini.o xetex-xetex0.o xetex-xetex-pool.o libxetex.a $TEXLIVE_BUILD_DIR/libs/harfbuzz/libharfbuzz.a $TEXLIVE_BUILD_DIR/libs/graphite2/libgraphite2.a $TEXLIVE_BUILD_DIR/libs/teckit/libTECkit.a $TEXLIVE_BUILD_DIR/libs/libpng/libpng.a $TEXLIVE_BUILD_DIR/libs/freetype2/libfreetype.a $TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a $TEXLIVE_BUILD_DIR/libs/zlib/libz.a libmd5.a lib/lib.a $TEXLIVE_BUILD_DIR/texk/kpathsea/.libs/libkpathsea.a -s ERROR_ON_UNDEFINED_SYMBOLS=0 $PREFIX/lib/libfontconfig.a $PREFIX/lib/libexpat.a $TEXLIVE_BUILD_DIR/libs/icu/icu-build/lib/libicuuc.a
Could you expand on meaning of the flags (-nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc
) and extra objects/libraries passed inside your command? Are they just in case?
After removing common items:
Your command uses: icu/icu-build/lib/libicudata.a poppler/libpoppler.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc
My command uses $TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a
It seem that newer TexLive moved from poppler to pplib.
Would you have insight about libicudata.a? Does linking to wasm to natively built stubdata/libicudata.a
work? What is it needed for? Some precomputed unicode info database for libicu?
Would you have comments about explicitly linking (statically) to libstdc++ and dynamically (?) to libm, gcc variants, libc etc? For me it seems to build without these additions.
This is how I build ICU:
pushd libs/icu/icu-build
mkdir -p bin stubdata lib
cp --preserve=mode $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/bin/icupkg $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/bin/pkgdata bin/
cp $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/stubdata/libicudata.a stubdata/
pushd common
$EMMAKE make $MAKEFLAGS #CXX="em++ -s ERROR_ON_UNDEFINED_SYMBOLS=0"
popd
pushd i18n
$EMMAKE make $MAKEFLAGS #CXX="em++ -s ERROR_ON_UNDEFINED_SYMBOLS=0"
popd
And it doesn't fail with any unknown symbols, so it seems to work even without explicit linking to stubdata, that is strange
I also could not build the full icu because of issues with asm
There seems to be a BUILD_DATA_WITHOUT_ASSEMBLY support in https://github.com/unicode-org/icu/blob/3b0772fff9c880b1c048878e9a11bf2d1278c69f/icu4c/source/tools/pkgdata/pkgdata.cpp#L761
I'll try building a faithful icu data for wasm
texmf.cnf is created in https://github.com/lyze/xetex-js/blob/master/workercontroller.js and is mentioned wrt kpathsea setup. However, there is no explicit kpathsea setup.
How does xetex discover /texmf.cnf
? Does it check it by default?
In node version you set up environment variables explicitly:
ENV['TEXMFDIST'] = 'cwd/texlive-{basic,small,full}/texmf-dist:';
ENV['TEXMFCNF'] = 'cwd:cwd/texlive-{basic,small,full}:cwd/texlive-{basic,small,full}/texmf-dist/web2c:';
ENV['TEXINPUTS'] = 'cwd:';
ENV['TEXFORMATS'] = 'cwd:';
I.e. I don't understand how will xetex discover /texmf.cnf
with content:
return `TEXMFDIST = /${this.virtualTexLiveRootDir}/texmf-dist\n` +
`TEXMFLOCAL = /${this.virtualTexLiveRootDir}/texmf-local\n` +
`TEXMFCONFIG = /${this.virtualTexLiveRootDir}/texmf-config\n` +
'TEXMF = {!!$TEXMFDIST,!!$TEXMFLOCAL,!!$TEXMFCONFIG}\n';
Yep, I saw it. My question is how xetex binary would discover the newly created /texmf.cnf
in root
There are some heuristics that kpathsea
takes, so usually TEXINPUTS
is part of the search path. Does this answer your question? I'm not sure I fully understand what you would like to know.
I think it does! Thank you! I'll also let you know when I do a similar test!
After removing common items: Your command uses:
icu/icu-build/lib/libicudata.a poppler/libpoppler.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc
My command uses$TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a
It seem that newer TexLive moved from poppler to pplib.
Would you have insight about libicudata.a? Does linking to wasm to natively built
stubdata/libicudata.a
work? What is it needed for? Some precomputed unicode info database for libicu?Would you have comments about explicitly linking (statically) to libstdc++ and dynamically (?) to libm, gcc variants, libc etc? For me it seems to build without these additions.
Do you have any thoughts about this? How can emscripten link link to a native stub data static library?
When I add libicudata.a
(produced by native icu build) to my final xetex
linking line, I get wasm-ld: error: unknown file type: stubdata.ao
When I link with emscripten-produced version of libicudata.a
, the build succeeds! (disabling assembly seemed to work)
Running xetex in browser:
["tmp", "home", "dev", "proc", "texlive", "xelatex.fmt", "xelatex", "source.tex", "texmf.cnf"]
(index):102 /xelatex: option is ambiguous:
(index):96 This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2021/dev_EM) (preloaded format=xelatex)
(index):102 kpathsea: Running mktexfmt xelatex.fmt
(index):102 kpathsea: fork(): Resource temporarily unavailable
(index):96 I can't find the format file `xelatex.fmt'!
FS.writeFile(xelatex, '', {encoding: 'utf-8'});
FS.writeFile(sourcePath, '\\documentclass{article}\\begin{document}Hello, world!\\end{document}', {encoding: 'utf-8'});
const texMfCnfContent = `TEXMFDIST = ${texliveRoot}/texmf-dist
TEXMFLOCAL = ${texliveRoot}/texmf-local
TEXMFCONFIG = ${texliveRoot}/texmf-config
TEXMF = {!!$TEXMFDIST,!!$TEXMFLOCAL,!!$TEXMFCONFIG}`;
FS.writeFile(texliveCnf, texMfCnfContent, {encoding: 'utf-8'});
ls('/');
Module.callMain(['-interaction=nonstopmode', '-no-pdf', '--', sourcePath]);
Not sure why it tries to search for xelatex.fmt when it is preloaded and texmf.cnf seems discovered ok
Afte specifying format file explicitly, I started to get:
(index):90 /xelatex: option is ambiguous:
This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2021/dev_EM) (preloaded format=/xelatex.fmt)
(index):84 entering extended mode
(index):84 Ĉͻ^^@I^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^
(index):84 ^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^
(index):84 @^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@
....
@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@Iͼ
(index):84 (Press Enter to retry, or Control-D to exit;Ά')
(index):84 ͿΈ
(index):84 Ĉğ
(index):84 ε
So apparently there is some error, and no xdv is produced, but the output is scrambled :(
I was running it with Module.callMain(['--interaction=nonstopmode', '--no-pdf', '--fmt=' + xelatexFmtPath, '--output-directory=/home/web_user', '--', sourcePath])
When I remove --
then option is ambiguous:
disappears. This is quite strange since natively compiled one does not have a problem with --
. Probably this is some divergence of emscripten cstdlib.
I think somehow it cannot create/open log file (judging from the stack trace), but not sure why it can't open it.
This is very strange, because file reading clearly works. And writing files from JavaScript clearly works as well.
After adding TEXMFVAR = /home/web_user
into /texmf.cnf
, I still get the same error, but now I got xelatex.fmt42.fls
created next to /home/web_user/source.tex
. So it's really some paths issue
but it seems that logs should be created in the current directory (/home/web_user
), so it's still strange
Maybe it's a conflict between /texmf.cnf
and /texlive/texmf-dist/web2c/texmf.cnf
and /texlive/texmf.cnf
I removed /texlive/texmf.cnf
, but error still persists. Very mysterious.
I confirmed, openlogfile fails for some reason
With very tedious printf debugging, I found out that for some reason xetex does not set the log file name (in nameoffile + 1
), so it tries to open the directory for writing and that obviously fails. Now a question is where does xetex normally set the log filename
It almost looks like constant string buffer got corrupted somehow
Okay, I think I have found the reason, and it's really strange:
zpackfilename (L13891 in attached xetex0.c
) assumes that passed in str_numbers are >= 65536. But this is not the case for str_numbers passed in from L14283 (".log" seems to be encoded as 902).
Here is excerpt from native xetex0.c:
void
openlogfile ( void )
{
openlogfile_regmem
unsigned char oldsetting ;
integer k ;
integer l ;
constcstring months ;
oldsetting = selector ;
if ( jobname == 0 )
jobname = getjobname ( 66180L ) ;
packjobname ( 66181L ) ;
recorderchangefilename ( stringcast ( nameoffile + 1 ) ) ;
packjobname ( 66182L ) ;
while ( ! aopenout ( logfile ) ) {
selector = 17 ;
promptfilename ( 66184L , 66182L ) ;
}
And here's excerpt from wasm one:
void
openlogfile ( void )
{
openlogfile_regmem
unsigned char oldsetting ;
integer k ;
integer l ;
constcstring months ;
oldsetting = selector ;
if ( jobname == 0 )
jobname = getjobname ( 900 ) ;
packjobname ( 901 ) ;
recorderchangefilename ( stringcast ( nameoffile + 1 ) ) ;
packjobname ( 902 ) ;
while ( ! aopenout ( logfile ) ) {
selector = 17 ;
promptfilename ( 904 , 902 ) ;
}
How these offsets are determined? Maybe I miss some compiler options?
After copying xetex0.c from native (though still would be good to figure it out), things moved to "article.cls not found" - again some cnf/texmfdist problem. I've got article.cls
in ./texlive/texmf-dist/tex/latex/base/article.cls
After using texmf.cnf from prefix-native/share/texmf-dist/web2c/texmf.cnf
, I've started to get this error in JavaScript: "Fontconfig error: Cannot load default config file"
How does fontconfig in your case discover fonts? Do you bundle the fonts somehow?
Do you configure it somehow to look into texmf-dist/fonts/
?
FONTCONFIG_PATH and FONTCONFIG_FILE helped!
When I did this some time ago, I recall setting TEXMFDIST
"just"(tm) worked. I think in this case you might have run into issues with using the texmf.cnf in the web2c directory, but I'm not certain.
Everything mostly worked! I managed to run XeTeX in browser (for now in synchronous mode) and even to link xetex and dvipdfmx in one executable by renaming some symbols
nice!
After renaming a few symbols and creating some dummy files,
#include <string.h>
extern int busymain_xetex(int argc, char* argv[]);
extern int busymain_dvipdfmx(int argc, char* argv[]);
int main(int argc, char* argv[])
{
if(strcmp("xetex", argv[1]) == 0 || strcmp("xelatex", argv[1]) == 0)
return busymain_xetex(argc - 1, argv + 1);
else if(strcmp("dvipdfmx", argv[1]) == 0)
return busymain_dvipdfmx(argc - 1, argv + 1);
}
calling this multiple times via invokeMain
works well! https://github.com/vadimkantorov/busytex/
I think of making some latexmk surrogate in C to simplify JavaScript interaction + this static "all-in-one" scheme would be useful for native builds as well
zpackfilename (L13891 in attached
xetex0.c
) assumes that passed in str_numbers are >= 65536. But this is not the case for str_numbers passed in from L14283 (".log" seems to be encoded as 902).
Would you have any ideas about this?
A problem with implementing latexmk
in C code seems that xetex uses exit(0) or exit(1) to stop execution. Any ideas on how to patch out exit to allow for sequential calls:
busymain_xetex(argc - 1, argv + 1);
...
busymain_dvipdfmx(argc - 1, argv + 1);
busymain_xetex(argc - 1, argv + 1);
I don't know what the state of support for system calls is today with emscripten, but perhaps it's worth a shot trying a wrapper with fork
and exec
.
I managed to get bibtex8 working as well. Now it's a single executable combining xetex/bibtex8/dvidpfmx
This seems to remove the need for copying apinames from native build and maybe makes the native build less necessary