lyze / xetex-js

Javascript port of XeTeX
Other
13 stars 3 forks source link

Pass CCexe=gcc to freetype make command #4

Open vadimkantorov opened 4 years ago

vadimkantorov commented 4 years ago

This seems to remove the need for copying apinames from native build and maybe makes the native build less necessary

vadimkantorov commented 4 years ago

Okay, I managed to compile xetex (by simply skipping the packagedata step of icu)

It still complained about popen - is it popen for running only dfvipdfmx? Or for many other things as well?

Now a question is how to test it. Should I use some wasm virtual machine first?

vadimkantorov commented 4 years ago

Please feel free to take a look at the "working" version: https://github.com/vadimkantorov/xetex2020.js/

vadimkantorov commented 4 years ago

@lyze Could you please explain what is the assumed format of texlive.lst? I'm confused by ": > $@" in Makefile. Does it mean it wants to have colon as the first line?

My texlive root directory looks like this:

vadimkantorov@DESKTOP-4UF8FID:/mnt/c/Users/user/xetex2020.js$ ls texlive
LICENSE.CTAN  README             profile.input        texmf-config  texmf-local  texmf.cnf
LICENSE.TL    README.usergroups  release-texlive.txt  texmf-dist    texmf-var    texmfcnf.lua

How should lines in texlive.lst look like?

texmf-dist/...

? or

texlive/texmf-dist/...
lyze commented 4 years ago

: > $@ is a really abstruse way to always truncate a file. The : is a bash builtin that always returns true, and $@ is the target.

texlive.lst contains a list of all files in the texlive distribution, and it is only used in the example with some glue code to use emscripten FS lazy loading.

vadimkantorov commented 4 years ago

Thanks! Didn't know this bash builtin!

So for first localhost http test, I could try to directly preload 130Mb of basic texlive installation to a emscripten binary (e.g. with xetex)?

vadimkantorov commented 4 years ago

Do I understand correctly that cwd thing is not a problem for the browser environment? As far as I can see, the lazy files go into /texlive and not into /cwd in https://github.com/lyze/xetex-js/blob/master/workercontroller.js?dfdf/qwewe#L322

lyze commented 4 years ago

Right, you could certainly try to do preload the entire installation in the binary if you wanted to. The wrapper code around the manifest file exists as an optimization to not have to download the entire distribution to load a webpage.

That's the right understanding too: the cwd thing is solely to make things work when running xetex with node.

vadimkantorov commented 4 years ago

I made an update to my scripts repo. Now it's much better since I introduced a ccskip.py helper script, that alleviates the need for running make multiple times.

I've got a question about your final linking command:

xetex_web2c_dir = $(XETEX_BUILD_DIR)texk/web2c/
web2c_objs = $(addprefix $(xetex_web2c_dir), xetexdir/xetex-xetexextra.o synctexdir/xetex-synctex.o xetex-xetexini.o xetex-xetex0.o xetex-xetex-pool.o)
xetex_libs_dir = $(XETEX_BUILD_DIR)libs/
xetex_libs = $(addprefix $(xetex_libs_dir), harfbuzz/libharfbuzz.a graphite2/libgraphite2.a icu/icu-build/lib/libicuuc.a icu/icu-build/lib/libicudata.a teckit/libTECkit.a poppler/libpoppler.a libpng/libpng.a)
xetex_link = $(web2c_objs) $(LIB_FONTCONFIG) $(xetex_web2c_dir)libxetex.a $(xetex_libs) $(LIB_EXPAT) $(xetex_libs_dir)freetype2/libfreetype.a $(xetex_libs_dir)zlib/libz.a $(xetex_web2c_dir)lib/lib.a $(XETEX_BUILD_DIR)texk/kpathsea/.libs/libkpathsea.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc
em++ $(EM_LINK_FLAGS) $(EM_LINK_OPT_WORKAROUND_FLAGS) --pre-js xetex.pre.js -o $@ $(xetex_link) -s TOTAL_MEMORY=536870912 -s EXPORTED_RUNTIME_METHODS=[] -s ERROR_ON_UNDEFINED_SYMBOLS=0 -s WASM=0

It's a lot of objects and flags. When I dumped what make is trying to do I got this:

em++ -g -O2 -o xetex xetexdir/xetex-xetexextra.o synctexdir/xetex-synctex.o xetex-xetexini.o xetex-xetex0.o xetex-xetex-pool.o  libxetex.a $TEXLIVE_BUILD_DIR/libs/harfbuzz/libharfbuzz.a $TEXLIVE_BUILD_DIR/libs/graphite2/libgraphite2.a $TEXLIVE_BUILD_DIR/libs/teckit/libTECkit.a $TEXLIVE_BUILD_DIR/libs/libpng/libpng.a $TEXLIVE_BUILD_DIR/libs/freetype2/libfreetype.a $TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a $TEXLIVE_BUILD_DIR/libs/zlib/libz.a libmd5.a lib/lib.a $TEXLIVE_BUILD_DIR/texk/kpathsea/.libs/libkpathsea.a -s ERROR_ON_UNDEFINED_SYMBOLS=0 $PREFIX/lib/libfontconfig.a $PREFIX/lib/libexpat.a $TEXLIVE_BUILD_DIR/libs/icu/icu-build/lib/libicuuc.a

Could you expand on meaning of the flags (-nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc) and extra objects/libraries passed inside your command? Are they just in case?

vadimkantorov commented 4 years ago

After removing common items: Your command uses: icu/icu-build/lib/libicudata.a poppler/libpoppler.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc My command uses $TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a

It seem that newer TexLive moved from poppler to pplib.

Would you have insight about libicudata.a? Does linking to wasm to natively built stubdata/libicudata.a work? What is it needed for? Some precomputed unicode info database for libicu?

Would you have comments about explicitly linking (statically) to libstdc++ and dynamically (?) to libm, gcc variants, libc etc? For me it seems to build without these additions.

vadimkantorov commented 4 years ago

This is how I build ICU:

pushd libs/icu/icu-build
mkdir -p bin stubdata lib
cp --preserve=mode $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/bin/icupkg $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/bin/pkgdata bin/
cp $TEXLIVE_SOURCE_DIR/texlive-build-native/libs/icu/icu-build/stubdata/libicudata.a stubdata/
pushd common
$EMMAKE make $MAKEFLAGS #CXX="em++ -s ERROR_ON_UNDEFINED_SYMBOLS=0"
popd
pushd i18n
$EMMAKE make $MAKEFLAGS #CXX="em++ -s ERROR_ON_UNDEFINED_SYMBOLS=0"
popd
vadimkantorov commented 4 years ago

And it doesn't fail with any unknown symbols, so it seems to work even without explicit linking to stubdata, that is strange

vadimkantorov commented 4 years ago

I also could not build the full icu because of issues with asm

vadimkantorov commented 4 years ago

There seems to be a BUILD_DATA_WITHOUT_ASSEMBLY support in https://github.com/unicode-org/icu/blob/3b0772fff9c880b1c048878e9a11bf2d1278c69f/icu4c/source/tools/pkgdata/pkgdata.cpp#L761

I'll try building a faithful icu data for wasm

vadimkantorov commented 4 years ago

texmf.cnf is created in https://github.com/lyze/xetex-js/blob/master/workercontroller.js and is mentioned wrt kpathsea setup. However, there is no explicit kpathsea setup.

How does xetex discover /texmf.cnf? Does it check it by default?

In node version you set up environment variables explicitly:

ENV['TEXMFDIST'] = 'cwd/texlive-{basic,small,full}/texmf-dist:';
ENV['TEXMFCNF'] = 'cwd:cwd/texlive-{basic,small,full}:cwd/texlive-{basic,small,full}/texmf-dist/web2c:';
ENV['TEXINPUTS'] = 'cwd:';
ENV['TEXFORMATS'] = 'cwd:';

I.e. I don't understand how will xetex discover /texmf.cnf with content:

return `TEXMFDIST = /${this.virtualTexLiveRootDir}/texmf-dist\n` +
      `TEXMFLOCAL = /${this.virtualTexLiveRootDir}/texmf-local\n` +
      `TEXMFCONFIG = /${this.virtualTexLiveRootDir}/texmf-config\n` +
      'TEXMF = {!!$TEXMFDIST,!!$TEXMFLOCAL,!!$TEXMFCONFIG}\n';
lyze commented 4 years ago

We set up texmf with synthetic data in the emscripten FS here.

vadimkantorov commented 4 years ago

Yep, I saw it. My question is how xetex binary would discover the newly created /texmf.cnf in root

lyze commented 4 years ago

There are some heuristics that kpathsea takes, so usually TEXINPUTS is part of the search path. Does this answer your question? I'm not sure I fully understand what you would like to know.

https://tug.org/texinfohtml/kpathsea.html#Path-sources

vadimkantorov commented 4 years ago

I think it does! Thank you! I'll also let you know when I do a similar test!

vadimkantorov commented 4 years ago

After removing common items: Your command uses: icu/icu-build/lib/libicudata.a poppler/libpoppler.a -nodefaultlibs -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic -lm -lgcc_eh -lgcc -lc -lgcc_eh -lgcc My command uses $TEXLIVE_BUILD_DIR/libs/pplib/libpplib.a

It seem that newer TexLive moved from poppler to pplib.

Would you have insight about libicudata.a? Does linking to wasm to natively built stubdata/libicudata.a work? What is it needed for? Some precomputed unicode info database for libicu?

Would you have comments about explicitly linking (statically) to libstdc++ and dynamically (?) to libm, gcc variants, libc etc? For me it seems to build without these additions.

Do you have any thoughts about this? How can emscripten link link to a native stub data static library?

vadimkantorov commented 4 years ago

When I add libicudata.a (produced by native icu build) to my final xetex linking line, I get wasm-ld: error: unknown file type: stubdata.ao

vadimkantorov commented 4 years ago

When I link with emscripten-produced version of libicudata.a, the build succeeds! (disabling assembly seemed to work)

vadimkantorov commented 4 years ago

Running xetex in browser:

["tmp", "home", "dev", "proc", "texlive", "xelatex.fmt", "xelatex", "source.tex", "texmf.cnf"]
(index):102 /xelatex: option is ambiguous: 
(index):96 This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2021/dev_EM) (preloaded format=xelatex)
(index):102 kpathsea: Running mktexfmt xelatex.fmt
(index):102 kpathsea: fork(): Resource temporarily unavailable
(index):96 I can't find the format file `xelatex.fmt'!
FS.writeFile(xelatex, '', {encoding: 'utf-8'});

    FS.writeFile(sourcePath, '\\documentclass{article}\\begin{document}Hello, world!\\end{document}', {encoding: 'utf-8'});

    const texMfCnfContent = `TEXMFDIST = ${texliveRoot}/texmf-dist
      TEXMFLOCAL = ${texliveRoot}/texmf-local
      TEXMFCONFIG = ${texliveRoot}/texmf-config
      TEXMF = {!!$TEXMFDIST,!!$TEXMFLOCAL,!!$TEXMFCONFIG}`;

    FS.writeFile(texliveCnf, texMfCnfContent, {encoding: 'utf-8'});

    ls('/');

    Module.callMain(['-interaction=nonstopmode', '-no-pdf', '--', sourcePath]);
vadimkantorov commented 4 years ago

Not sure why it tries to search for xelatex.fmt when it is preloaded and texmf.cnf seems discovered ok

vadimkantorov commented 4 years ago

Afte specifying format file explicitly, I started to get:

(index):90 /xelatex: option is ambiguous: 
This is XeTeX, Version 3.14159265-2.6-0.999992 (TeX Live 2021/dev_EM) (preloaded format=/xelatex.fmt)
(index):84 entering extended mode
(index):84 Ĉͻ^^@I^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^
(index):84 ^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^
(index):84 @^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@
....
@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@Iͼ
(index):84 (Press Enter to retry, or Control-D to exit;Ά')
(index):84 ͿΈ
(index):84 Ĉğ
(index):84 ε

So apparently there is some error, and no xdv is produced, but the output is scrambled :(

vadimkantorov commented 4 years ago

I was running it with Module.callMain(['--interaction=nonstopmode', '--no-pdf', '--fmt=' + xelatexFmtPath, '--output-directory=/home/web_user', '--', sourcePath])

When I remove -- then option is ambiguous: disappears. This is quite strange since natively compiled one does not have a problem with --. Probably this is some divergence of emscripten cstdlib.

vadimkantorov commented 4 years ago

I think somehow it cannot create/open log file (judging from the stack trace), but not sure why it can't open it.

vadimkantorov commented 4 years ago

This is very strange, because file reading clearly works. And writing files from JavaScript clearly works as well.

vadimkantorov commented 4 years ago

After adding TEXMFVAR = /home/web_user into /texmf.cnf, I still get the same error, but now I got xelatex.fmt42.fls created next to /home/web_user/source.tex. So it's really some paths issue

vadimkantorov commented 4 years ago

but it seems that logs should be created in the current directory (/home/web_user), so it's still strange

vadimkantorov commented 4 years ago

Maybe it's a conflict between /texmf.cnf and /texlive/texmf-dist/web2c/texmf.cnf and /texlive/texmf.cnf

vadimkantorov commented 4 years ago

I removed /texlive/texmf.cnf, but error still persists. Very mysterious.

vadimkantorov commented 4 years ago

I confirmed, openlogfile fails for some reason

vadimkantorov commented 4 years ago

With very tedious printf debugging, I found out that for some reason xetex does not set the log file name (in nameoffile + 1), so it tries to open the directory for writing and that obviously fails. Now a question is where does xetex normally set the log filename

vadimkantorov commented 4 years ago

It almost looks like constant string buffer got corrupted somehow

vadimkantorov commented 4 years ago

Okay, I think I have found the reason, and it's really strange:

zpackfilename (L13891 in attached xetex0.c) assumes that passed in str_numbers are >= 65536. But this is not the case for str_numbers passed in from L14283 (".log" seems to be encoded as 902).

xetex0.zip

Here is excerpt from native xetex0.c:

void
openlogfile ( void )
{
  openlogfile_regmem
  unsigned char oldsetting  ;
  integer k  ;
  integer l  ;
  constcstring months  ;
  oldsetting = selector ;
  if ( jobname == 0 )
  jobname = getjobname ( 66180L ) ;
  packjobname ( 66181L ) ;
  recorderchangefilename ( stringcast ( nameoffile + 1 ) ) ;
  packjobname ( 66182L ) ;
  while ( ! aopenout ( logfile ) ) {

    selector = 17 ;
    promptfilename ( 66184L , 66182L ) ;
  }

And here's excerpt from wasm one:

void
openlogfile ( void )
{
  openlogfile_regmem
  unsigned char oldsetting  ;
  integer k  ;
  integer l  ;
  constcstring months  ;
  oldsetting = selector ;
  if ( jobname == 0 )
  jobname = getjobname ( 900 ) ;
  packjobname ( 901 ) ;
  recorderchangefilename ( stringcast ( nameoffile + 1 ) ) ;
  packjobname ( 902 ) ;

  while ( ! aopenout ( logfile ) ) {

    selector = 17 ;
    promptfilename ( 904 , 902 ) ;
  }

How these offsets are determined? Maybe I miss some compiler options?

vadimkantorov commented 4 years ago

After copying xetex0.c from native (though still would be good to figure it out), things moved to "article.cls not found" - again some cnf/texmfdist problem. I've got article.cls in ./texlive/texmf-dist/tex/latex/base/article.cls

vadimkantorov commented 4 years ago

After using texmf.cnf from prefix-native/share/texmf-dist/web2c/texmf.cnf, I've started to get this error in JavaScript: "Fontconfig error: Cannot load default config file"

vadimkantorov commented 4 years ago

How does fontconfig in your case discover fonts? Do you bundle the fonts somehow?

vadimkantorov commented 4 years ago

Do you configure it somehow to look into texmf-dist/fonts/?

vadimkantorov commented 4 years ago

FONTCONFIG_PATH and FONTCONFIG_FILE helped!

lyze commented 4 years ago

When I did this some time ago, I recall setting TEXMFDIST "just"(tm) worked. I think in this case you might have run into issues with using the texmf.cnf in the web2c directory, but I'm not certain.

vadimkantorov commented 4 years ago

Everything mostly worked! I managed to run XeTeX in browser (for now in synchronous mode) and even to link xetex and dvipdfmx in one executable by renaming some symbols

lyze commented 4 years ago

nice!

vadimkantorov commented 4 years ago

image

vadimkantorov commented 4 years ago

After renaming a few symbols and creating some dummy files,

#include <string.h>

extern int busymain_xetex(int argc, char* argv[]);
extern int busymain_dvipdfmx(int argc, char* argv[]);

int main(int argc, char* argv[])
{
    if(strcmp("xetex", argv[1]) == 0 || strcmp("xelatex", argv[1]) == 0)
        return busymain_xetex(argc - 1, argv + 1);
    else if(strcmp("dvipdfmx", argv[1]) == 0)
        return busymain_dvipdfmx(argc - 1, argv + 1);
}

calling this multiple times via invokeMain works well! https://github.com/vadimkantorov/busytex/

I think of making some latexmk surrogate in C to simplify JavaScript interaction + this static "all-in-one" scheme would be useful for native builds as well

vadimkantorov commented 4 years ago

zpackfilename (L13891 in attached xetex0.c) assumes that passed in str_numbers are >= 65536. But this is not the case for str_numbers passed in from L14283 (".log" seems to be encoded as 902).

Would you have any ideas about this?

vadimkantorov commented 4 years ago

A problem with implementing latexmk in C code seems that xetex uses exit(0) or exit(1) to stop execution. Any ideas on how to patch out exit to allow for sequential calls:

busymain_xetex(argc - 1, argv + 1);
...
busymain_dvipdfmx(argc - 1, argv + 1);
busymain_xetex(argc - 1, argv + 1);
lyze commented 4 years ago

I don't know what the state of support for system calls is today with emscripten, but perhaps it's worth a shot trying a wrapper with fork and exec.

vadimkantorov commented 4 years ago

I managed to get bibtex8 working as well. Now it's a single executable combining xetex/bibtex8/dvidpfmx

vadimkantorov commented 4 years ago

image