Open crlf0710 opened 5 years ago
One thing to note here is that (i believe...) some of this c code being converted I believe is actually generated via web2c, and rather than jumping from web -> c -> rust, it'd eventually be nice to just go directly from web -> rust.
As such, perhaps this can help guide us in where to direct the most effort in deciding which sources should receive more attention than others. I don't know enough about the underlying engine yet to say which, but a web2rust is something i would fancy working on eventually (however not in the near-term).
@ratmice Yup, i believe the web converted code lives in the xetex_xetex0.rs and xetex_ini.rs. pkgw has added nice prefixes on each filename so it's easy to tell these files apart.
Another thing that some people may care about is the mixture state of code license. I think the xetex, bibtex, synctex part of the code are MIT-compatible, but dpx part (dvipdfmx, for pdf generation) and teckit(written in c++) are GPL-ed. If they can be replaced by MIT-compatible licensed code, it will enable Tectonic be embedded and used in more scenarios, including many commercial solutions.
EDIT: BibTex is written in WEB too.
When it comes to Tectonic, yes, the bulk of the original code that we compile is C code that was generated through web2c. I have oriented my work around a very explicit decision that we are only going to base on the C code going forward. I will consult with the Web code to understand what's going on, and I'll look at XeTeX patches in Web-space, but that's it. I've done a lot of work to tidy up the C code since for me, it is the most important expression of the original engine code upon which Tectonic is based.
The basic motivation for this approach maps to this: I have a suspicion that a web2rust
would be super hard to pull off. AFAICT, much of the TeX code (in its native Web) depends on aliasing tricks that would be unsafe Rust at best, and I wouldn't be surprised if some of the Web source wasn't even expressible as unsafe. One of the ways that I try to evolve the C code of Tectonic is to reduce the dirty tricks that it pulls — hopefully leading to a more natural Rust-ification of the C code someday. I'm very surprised that c2rust can handle what we've currently got!
@pkgw Thanks for clarifying your position on this, and the word of caution. It is very helpful to know when my initial inclination is contrary to explicit decisions that have been made. Reflecting upon that, if rustweb or web2rust ever become things, they will certainly need to take into account everything learned from the manual conversion of the web2c output. As such, my comment on where to focus attention does seem misguided.
@ratmice Sorry, I didn't mean to come across as being that negative! The way that I envision Tectonic going right now, I don't think we'd use a web2rust
tool for the core engine ... but, for instance, if there are other support tools that we'd like to integrate that are Web-based, I'd much rather go straight to Rust than convert to C! And of course, Tectonic is not the whole universe here.
And also, to be clear, the idea of Rustifying the C code is super interesting! Again, I totally didn't think it would even be possible given how many tricks the implementation pulls. My day job is keeping me ultra-busy these days but I hope to find a chance to see how c2rust
makes everything work!
@pkgw I didn't really take it negatively at all, hopefully some context might shine a light on where i'm coming from.
a) my interest in TeX largely stems from it being a non-dynamically allocating language implementation portable to modern systems. Investigating this in modern systems languages. I imagine much of the platform specific non-generated C-code was written without this style in mind, where in the generated sources this aspect should still be relatively intact. To the extent that one might imagine being able to compile the rustified generated code in a no_std environment.
b) being overall new to the engine of both xetex and tectonic itself, going through the update process of staging is a bit overwhelming when it comes to merging the modified upstream with the modified tectonic downstream code base.
The fear is that less reliance on the web portion, we remove the limitations of its original environment, the dividing line becomes much less clear, and the tricks in the implementation gain context. Perhaps the appropriate way to preserve A. is to try compiling the rustified c-code in it's own no_std crate -- if this is an aspect tectonic wishes to preserve, I will give that a go.
Here is the error I get when trying to compile this on macos:
error: failed to run custom build command for `tectonic_engine v0.0.1-dev (/private/tmp/tectonic/engine)`
Caused by:
process didn't exit successfully: `/private/tmp/tectonic/target/debug/build/tectonic_engine-afd9796eb3d3d62f/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-env-changed=TECTONIC_DEP_BACKEND
--- stderr
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Failure { command: "\"pkg-config\" \"--libs\" \"--cflags\" \"harfbuzz >= 1.4 harfbuzz-icu icu-uc freetype2 graphite2 libpng zlib\"", output: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "Package icu-uc was not found in the pkg-config search path.\nPerhaps you should add the directory containing `icu-uc.pc\'\nto the PKG_CONFIG_PATH environment variable\nPackage \'icu-uc\', required by \'harfbuzz-icu\', not found\n" } }', src/libcore/result.rs:999:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
EDIT
I installed icu. with
brew install icu4c
brew link icu4c
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig"
and now the error is
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2,sse3,ssse3")
running: "cc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "." "-I" "/usr/local/Cellar/icu4c/64.2/include" "-I" "/usr/local/Cellar/harfbuzz/2.6.1/include/harfbuzz" "-I" "/usr/local/Cellar/glib/2.60.6/include/glib-2.0" "-I" "/usr/local/Cellar/glib/2.60.6/lib/glib-2.0/include" "-I" "/usr/local/opt/gettext/include" "-I" "/usr/local/Cellar/pcre/8.43/include" "-I" "/usr/local/opt/freetype/include/freetype2" "-I" "/usr/local/Cellar/graphite2/1.3.13/include" "-I" "/usr/local/Cellar/libpng/1.6.37/include/libpng16" "-Wall" "-Wextra" "-Wall" "-Wcast-qual" "-Wdate-time" "-Wendif-labels" "-Wextra" "-Wextra-semi" "-Wformat=2" "-Winit-self" "-Wmissing-declarations" "-Wmissing-include-dirs" "-Wmissing-prototypes" "-Wmissing-variable-declarations" "-Wnested-externs" "-Wold-style-definition" "-Wpointer-arith" "-Wredundant-decls" "-Wstrict-prototypes" "-Wswitch-bool" "-Wundef" "-Wwrite-strings" "-Wno-unused-parameter" "-Wno-implicit-fallthrough" "-Wno-sign-compare" "-std=gnu11" "-DHAVE_ZLIB=1" "-DHAVE_ZLIB_COMPRESS2=1" "-DZLIB_CONST=1" "-DXETEX_MAC=1" "-o" "/private/tmp/tectonic/target/debug/build/tectonic_engine-f115cc9379b11c54/out/tectonic/xetex-macos.o" "-c" "tectonic/xetex-macos.c"
cargo:warning=clang: error: no such file or directory: 'tectonic/xetex-macos.c'
cargo:warning=clang: error: no input files
exit code: 1
--- stderr
error occurred: Command "cc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "." "-I" "/usr/local/Cellar/icu4c/64.2/include" "-I" "/usr/local/Cellar/harfbuzz/2.6.1/include/harfbuzz" "-I" "/usr/local/Cellar/glib/2.60.6/include/glib-2.0" "-I" "/usr/local/Cellar/glib/2.60.6/lib/glib-2.0/include" "-I" "/usr/local/opt/gettext/include" "-I" "/usr/local/Cellar/pcre/8.43/include" "-I" "/usr/local/opt/freetype/include/freetype2" "-I" "/usr/local/Cellar/graphite2/1.3.13/include" "-I" "/usr/local/Cellar/libpng/1.6.37/include/libpng16" "-Wall" "-Wextra" "-Wall" "-Wcast-qual" "-Wdate-time" "-Wendif-labels" "-Wextra" "-Wextra-semi" "-Wformat=2" "-Winit-self" "-Wmissing-declarations" "-Wmissing-include-dirs" "-Wmissing-prototypes" "-Wmissing-variable-declarations" "-Wnested-externs" "-Wold-style-definition" "-Wpointer-arith" "-Wredundant-decls" "-Wstrict-prototypes" "-Wswitch-bool" "-Wundef" "-Wwrite-strings" "-Wno-unused-parameter" "-Wno-implicit-fallthrough" "-Wno-sign-compare" "-std=gnu11" "-DHAVE_ZLIB=1" "-DHAVE_ZLIB_COMPRESS2=1" "-DZLIB_CONST=1" "-DXETEX_MAC=1" "-o" "/private/tmp/tectonic/target/debug/build/tectonic_engine-f115cc9379b11c54/out/tectonic/xetex-macos.o" "-c" "tectonic/xetex-macos.c" with args "cc" did not execute successfully (status code exit code: 1).
Should I open an issue ?
@crlf0710, is @lovasoa's problem potentially an oversight in the build scripts for your oxidize
branch?
It looks like xetex-macos.c
is included conditionally, so it was not translated to rust by c2rust.
@lovasoa I updated the branch and added the file. I believe there'll still be linking issues. For oxidization issues. I think you can open issues and/or pr under https://github.com/crlf0710/tectonic/issues.
I've invited @pkgw as collaborator in the fork repo, i think when the oxidized code base is a good enough state (might take a while), we can upstream the oxidized version back here.
This sounds interesting, how can people contribute best?
I've opened a few issues under https://github.com/crlf0710/tectonic/issues. Leave a comment there if you want to try one of those~
Status update: Since then we've successfully manually converted all the remaining xetex c++/objc code into rust, now the oxidized version of tectonic can run successfully on windows/linux/mac. Unfortunately there are some nightly feature usages introduced by c2rust itself. We're still working on removing them to allow the crate to build on stable again.
We're also working using existing crates to replace the manual dependency specification. Any help is appreciated here!
By the way, I'm also spending a little time to investigate on the original WEB tooling that TeX itself uses. Hope i can find something useful soon.
@crlf0710 Thanks for the update! I'm afraid that I still haven't looked over to see what the translated code looks like, but I'm very impressed that it's all working!
It would be great to add notice in master repo README or/and site about this work, as well as an invitation to participate.
@burrbull Yes, I am happy to do so, but have been overwhelmed with my day job lately and haven't had the bandwidth to take the initiative on matters like this. Pull requests are more than welcome, as they say.
I have progress in oxidizing: https://github.com/crlf0710/tectonic/pull/273 But I need help in regression testing of changes. @Mrmaxmeier are you still interested in project? Your site https://tt.ente.ninja/ is shut down.
Thanks for the update @burrbull . I have not been following the effort due to personal time constraints, but I would love to make sure that this repo and the oxidation effort stay well-aligned and that we have a vision for how to merge the work back into the mainline.
BTW, I am working towards updating Tectonic for TeXLive 2020.0 and it looks like I will need to update the C/C++ code for the first time in several years. There's always more to do ...
I have progress in oxidizing: crlf0710#273 But I need help in regression testing of changes. @Mrmaxmeier are you still interested in project? Your site https://tt.ente.ninja/ is shut down.
Sorry about that. The setup is a bit awkward and needs a bunch of storage and compute. It previously lived on my NAS but I'll resurrect it somewhere else. I might reduce the amount of samples though as each run currently takes ~10h in CPU time.
[..] but I would love to make sure that this repo and the oxidation effort stay well-aligned and that we have a vision for how to merge the work back into the mainline.
This concerns me as well. I guess the regression tests are nice to ensure correct refactorings, but it doesn't seem like there's a straight-forward path to merging things back into tectonic.
I can help with the server for that for a while. Send me an email.
I can help with the server for that for a while. Send me an email.
Not sure what do you mean. I am thinking about something like hosting the self-hosted GitHub CI worker and probably syncing script with that arXiv S3 bucket.
Need help in porting #666 to oxidize
.
Talking about donate it makes sense to move this branch under https://github.com/tectonic-typesetting/ organization (or create new?) and add this option to https://tectonic-typesetting.github.io/en-US/contribute.html. cc @pkgw
https://github.com/crlf0710/tectonic/issues/257#issuecomment-734280060
Now that Rust Foundation is announced, maybe it makes sense to cooperate with them for donations/etc? For example, registering GitHub Sponsors with them.
Out of interest, is there a plan for when/how the oxidised fork gets merged back into master? Separate to this fork, I've been having a go at reimplementing parts of the XeTeX engine in rust using cbindgen, referencing documentation generated from the XeTeX web source rather than just the generated C code.
@ralismark Sorry for the extremely late reply here. The honest answer is that there isn't a plan per se — I'd like to see it happen but my personal priority is to get the infrastructure for real HTML output in place (and I'm so busy with real life these days that even that project is going nowhere fast). My hope is that the split into crates will make it easier to incrementally migrate oxidized C code back into the codebase, but unfortunately it's true that the split introduced a bunch of changes that I'm sure are difficult to mirror into the fork.
Just to throw it out there, though, the bibtex engine is a single C file, so it would probably be by far the easiest place to start an incremental oxidization effort. It would also be great to migrate the xetex_layout
crate, which is OS-dependent font code that would be great to get into a more modern, flexible Rust incarnation. And that code is actual human-written C++ and not hard-to-understand web2c output.
I really like the work from oxidize, but have started work on a slightly different approach: in #1032, I've started a bottom-up hand conversion. The idea being that by starting at the code that depends on the least other code, or statics that are common but relatively easy to replace with an API, and converting things above it as all their requirements are converted, the result is both amenable to human review commit-by-commit (since every individual refactor results in a functional midpoint), and allows for code to be converted into safe Rust as soon as possible, since each function's dependents are the first code to next be converted.
I'm not sure how far I'll get with this strategy, but I'm hoping if I can convert bibtex it will act as a proof-of-concept for the quality of the resulting code.
It's worth noting that my conversion makes no effort to preserve the properties of the original code - if anything it sprints in the opposite direction, attempting to use common Rust practices and the std as much as possible.
I'm glad to hear someone else started similar work.
@CraftSpider I saw it was merged. Awesome work! Good to see the effort continues.
So, i have a branch that converted all the c code (c++ code and objc code not included, yet) to rust using the c2rust tool at https://github.com/crlf0710/tectonic/tree/oxidize . However the code generated is not fully portable, and all pre-processor macros are lost.
Current progress is here:
HAVE_ZLIB
section of code (mostly in dpx-pdfobj.c) is not there yet.vsprintf
,__ctype_b_loc
, etc.. are not immediately usable on windows rust.I'll continue work on this, might be slow. If anyone find the branch useful or want to give a hand, just go ahead.