petersalomonsen / wasm-git

GIT for nodejs and the browser using https://libgit2.org compiled to WebAssembly with https://emscripten.org
Other
648 stars 37 forks source link

Advice for a Rust/Wasm Git project #17

Closed happybeing closed 3 years ago

happybeing commented 3 years ago

I'm building a GitHub like static web app, a p2p Git Portal which will run from static hosting or peer-to-peer storage (so no server side code) and hoping you can give me some advice on whether I can do what I need with wasm-git.

I have a proof-of-concept in Go/Wasm but need to replace Go with a better wasm source and long term will aim for Rust/Wasm. I'll need git features (so wasm-git) and plan to add support for issues, PRs using git objects (the approach used by the Go library git-bug as in my demo).

I'm novice in all these areas so unsure how to do this and wondering if this a good approach and how to go about:

It all looks possible but I'm not sure how hard it will be to get there, and if I have the time and skill to attempt it. I could carry on with the Go/Wasm, but Go isn't good for security or reliability, is too slow and has a 12MB runtime. You can see the first demo of the portal which has basic git functionality plus create/view issues (no UI for creation yet) here: http://gitch.happybeing.com It's not a million miles away from the features of your own demo, but with added issues!

Any tips or warnings etc would be very much appreciated. I've had a talk with the author of gitoxide a naive Rust implementation of git primitives, but there's not much functionality there yet so I'm looking for an alternative way of obtaining the core git functionality so I can focus on the additional features.

petersalomonsen commented 3 years ago

Your project is looking very promising. Great to see a p2p git portal.

libgit2, which is what wasm-git is based on has pretty much all the git core functionality there is. Not all of it is exposed in wasm-git, but with a little knowledge of c and emscripten it is not much work to expose more features if needed.

Regarding using wasm-git from Rust rather than JS: It is possible to instantiate a rust wasm binary with imports referencing wasm-git exports.

Regarding replacing the file system: libgit2 use the file system, and syscalls like mmap which are implemented in Javascript. Also in wasm-git I've implemented http transports, in Javascript. Replacing these with your own implementations would for sure require a larger effort. For wasm-git I spent quite some time fixing issues in emscripten related to mmap.

Using Rust for implementing the business logic should be fine. Javascript would probably be quicker and less work, but Rust could most probably provide cleaner code and a smaller bundle size.

Looking forward to see the further progress of your git portal!

happybeing commented 3 years ago

Thanks Peter, I appreciate your time and helpful insights. There doesn't seem to be much around on Emscripten + Rust and what I've found is discouraging, so I'm not sure if they work well together. I will dig further and maybe build small test to see what's involved.

It's good to know that exposing any missing git features should be fairly easy. I was hoping the fs would be too, so that's something I'll need to investigate. My fallback is to use JS for git + fs initially, and Rust just for the new parts, though I've not looked in detail at that yet.

happybeing commented 3 years ago

Hi Peter, I'm doing some experiments to understand how I could use wasm-git in a Svelte app and trying to import a function (just a random thing I found exported from the node_modules/wasm-git/lg2.wasm). So in my App.svelte I have:

import clock_gettime from "wasm-git/lg2.wasm"

test();

async function test() {
    console.log("clock_gettime()");
    console.dir(await clock_gettime());
}

My aim is to build a static web app (no server side code) in Svelte with wasm-git and Rust/Wasm and in the browser console I'm seeing this:

hello App.svelte:2:8
clock_gettime() App.svelte:10:9
XHR GEThttp://localhost:5000/e93399eed71517e6.wasm

I'd like everything on my own server not via a third party server as seems to be the assumption looking at the examples which reference wasm git on unpkg. Am I correct here, and is there a way (or any examples) showing how I can have all the code locally?

Thanks.

petersalomonsen commented 3 years ago

You can have everything locally. You just need the files lg2.js and lg2.wasm that should already be present in node_modules/wasm-git.

However I see you import lg2.wasm and not lg2.js which I do ( like in this example: https://github.com/petersalomonsen/wasm-git/blob/master/examples/example_webworker.js )

By importing lg2.js the lg2.wasmwill be automatically loaded and instantiated, and you can call the git functions like in the example file.

Also you should always use wasm-git in a webworker and not on the main thread ( as it would affect the UI performance ). Another thing is that the main thread does not support synchronous xhr, and so interaction with a git http server has to be done from a worker.

happybeing commented 3 years ago

Thanks Peter, I'm poking around in the dark atm so your pointers are very helpful. I began with lg2.js but avoided using a webworker because the environment I'm targetting (a browser built using Electron) doesn't support them although I hope this will be added. Interaction with a git http server is not needed in the target environment (even though I want it for testing) so this may be less of an issue for me.

So having had trouble with importing lg2.js directly I switched to see if I could do better with lg2.wasm, loading with with the @rollup/plugin-wasm plugin. Do you believe that is NOT sufficient, for example to "instantiate" lg2.wasm as you mention? I'm still wondering what the XHR request is trying to load there - seems an odd URI.

I'll try using a webworker for now and once I have that working can explore from there.

petersalomonsen commented 3 years ago

No, it is not sufficient to just do lg2wasmbytes = await fetch('lg2.wasm').then(r => r.arrayBuffer()); WebAssembly.instantiate(lg2wasmbytes, {}) which would work for a webassembly module without external dependencies.

lg2.wasm expects to get various properties in the imports parameter object such as syscalls and file system support. All of this is provided by lg2.js.

You can however simply use the script tag like this:

<script src="lg2.js"></script>

and then from Javascript you can run Module.callMain(['status']) to call a git status command.

However this approach will run on the main thread ( not using a worker ), and is not really recommended, but will probably work better with Electron ( I haven't tested that though, but it works fine in the browser ).

happybeing commented 3 years ago

However this approach will run on the main thread ( not using a worker )

I figured out how to load lg2.js in index.html and call it as you described, and can call Emscripten functions (e.g. to set up a MEMFS filesystem) as in your clone example. But the clone() itself was failing because the http transports were not defined for ENVIRONMENT_IS_WEB.

So I enabled the webworker http transports for ENVIRONMENT_IS_WORKER || ENVIRONMENT_IS_WEB. This fails because you can't change the responseType on a synchronous request. Presumably that is allowed in a webworker?

So I tried converting the string response to a Uint8Array in emscriptenhttpread which gets a bit further but is causing Chrome dev tools to crash so I'm probably going to back out and try a webworker for now (as I said earlier :eyes:).

Is there something I'm missing about the http transports for use in a browser? They aren't present and aren't set in wasm-git post.js so something seems amiss.

petersalomonsen commented 3 years ago

Yes, synchronous http requests are only allowed in webworkers, so the http transports will not work on the main thread as it is now.

One way to make it work on the main thread would be to compile wasm-git with the Asyncify feature in emscripten, which would make it possible to use asynchronous xhr / fetch.

For performance reasons I would rather recommend finding a way to run it as a worker instead of having everything in the main thread.

happybeing commented 3 years ago

I have your example_webworker.js working in my test application, cloning and operating on a MEMFS filesystem and am wondering if there's a way to share the filesystem with the app. If not, everything has to be done inside the webworker, making it tricky to implement operations on the .git directory from outside the webworker (using Rust/wasm or anything else).

Reading up a little on workers it seems impractical to share the filesystem using postMessage() of structured data.

Looking for use of webworkers with filesystem APIs I can only find very old articles which seem to reference non-standard or discontinued APIs such as `"FileSystemSync" (here) so am I right in thinking there's no way to operate on a shared in-browser filesystem with a webworker (N.B. I'm working purely client side - no server fs)?

If so I'll probably explore your Asyncify suggestion for this application because performance issues should not be an issue for my use case. (Most of my need for wasm-git is for to operate on an in-browser filesystem. Operations on remotes are mostly just handy for testing.)

BTW the speed of wasm-git clone is impressive, noticeably faster than using go-git compiled to wasm.

maks commented 3 years ago

@happybeing you might want to have a look at using IndexedDB as a backend for sharing a FS between main thread and webworkers: https://raananw.github.io/WebWorkers-IndexedDB/ and I think there are already several projects that can expose a filesystem API thats backed by IndexedDB.

petersalomonsen commented 3 years ago

@happybeing @maks wasm-git have already got support for indexeddb, as emscripten comes with the IDBFS (indexed db file system) out of the box.

In this project here ( the wasm-git demo ) you can see indexeddb being used: https://github.com/petersalomonsen/githttpserver/blob/master/public/libgit2_webworker.js

Still I wouldn't recommend altering the indexeddb file system from outside the webworker, as emscripten keeps an in memory copy that it operates on. Rather have a look at the wasm-git demo sources in the link above and see how you can create an interface to the webworker messageport that you can share across the application. This is also what I've done in the wasm-music project.

For example I have this client "library" that can be used on the main-thread, and as you can see it posts messages to the worker, and waits for confirmation back that files have been written, staged to git etc: https://github.com/petersalomonsen/javascriptmusic/blob/master/wasmaudioworklet/wasmgit/wasmgitclient.js#L143

happybeing commented 3 years ago

Thanks for the suggestions about IndexedDB.. That's useful to know and I will look into it once I've finished the Asyncify work.

@petersalomonsen I now have clone() working using async transports via Asyncify as you suggested, thanks :smile:. There's one issue, which is that callMain() (below) does not wait for the clone() to complete. It returns immediately. I don't understand how callMain() can return before the clone() function has returned given that it works as expected for a webworker.

lg.callMain(['clone','https://cors.isomorphic-git.org/github.com/torch2424/made-with-webassembly.git','clonedtest']);

What I did was provide a set of async emscriptenhttp functions for ENVIRONMENT_IS_WEB which use async requests. I then provided Asyncified functions in wasm-git/libgit2/src/transports/emscriptenhttp.c to wrap the call to each async transport. This all appears to be working correctly and the clone() succeeds.

Do you have any suggestions here? I tried await lg.callMain() which made no difference, and am not sure where to start with this.

happybeing commented 3 years ago

I've tidied up and provided options in build.sh to build async versions on this branch: https://github.com/happybeing/wasm-git/tree/asyncify-emscriptenhttp-transports. Let me know if you want a PR and if you have any ideas for fixing the issue with callMain() (above) I'll look into it. Thanks again for your help @petersalomonsen.

petersalomonsen commented 3 years ago

Great work on asyncify @happybeing ! I haven't tried asyncify myself so I'm not sure about how to wait for the clone, but I would guess it is about making callMain async. Will have to check more with the emscripten asyncify docs on this ( or maybe even @kripken can give us a quick hint ).

A PR would for sure be nice. I would then be very happy if you could also add a test, maybe just create a similar test case for the asyncify build as for the worker which you can find here: https://github.com/petersalomonsen/wasm-git/blob/master/test-browser/test.spec.js

Also for a clean commit history it would be good if you could squash to 1 commit.

Async support is for sure a valuable feature for wasm-git 👍

petersalomonsen commented 3 years ago

btw @happybeing . what if you use ASYNCIFY_IMPORTS on callMain itself? Does that make it an async method that you can wait for?

happybeing commented 3 years ago

ASYNCIFY_IMPORTS is intended for the Asyncified functions in transports/emscriptenhttp.c, but I tried it on callMain() anyway (and some other functions _main, and main just cos it was easy), but it didn't help.

Looking at callMain() it is a JS function which makes a simple call to the asm implementation of main(). I don't see any issues with that as the asyncification is on things called within clone() and not on clone() itself. This suggests I've not asyncified properly or there's a bug in Emscripten.

Assuming it's me, I know that the functions I asyncified are working synchronously within the functions which call them, but maybe I've missed something in the implementation. For example, I left emscriptenhttpwrite() as is because it looks entirely synchronous but maybe you could take a look and make sure that is correct. In fact, please can you take a look at the transports in https://github.com/happybeing/wasm-git/blob/asyncify-emscriptenhttp-transports/emscriptenbuild/post-async.js#L28 and see if you can spot any holes in my Promise implementation? I'm not sure what I could be doing which causes these symptoms.

It seems odd that the calls to new asyncified functions in emscriptenhttp-async.c work synchronously, but that something higher up the call tree which I've not touched (except perhaps with -s ASYNCIFY) is no longer synchronous. It would be interesting to know what -s ASYNCIFY does because it doubles the size of the lg2.wasm which I don't understand.

I'll look a bit more at this but want to dig into adding a custom filesystem as the ability to do that is crucial for my use of wasm-git.

petersalomonsen commented 3 years ago

@happybeing I tested your fork, and found that I could push a callback method to the Asyncify.asyncFinalizers array in order to wait for the clone:

            lg.callMain(['clone', 'http://localhost:5000/test', 'testrepo']);
            Asyncify.asyncFinalizers.push(() => {
                console.log('Cloned.....');
                FS.chdir('testrepo');
            });

This seems to be what happens when using ccall to interacti with the wasm module, but since we are using a direct call here we have to push to the asyncFinalizers array ourselves.

Again, I'm not sure if this is the correct way to do this, I was just studying what ccall does, and found that it worked for clone. Give it a try yourself and see if it works for you.

happybeing commented 3 years ago

Yes that works, good find. I'm not sure how to make use of this yet, just letting you know it does the trick.

happybeing commented 3 years ago

This works:

await lg.callMain(['clone','https://cors.isomorphic-git.org/github.com/torch2424/made-with-webassembly.git','clonedtest']);
await new Promise((resolve) => { Asyncify.asyncFinalizers.push(() => { resolve();}); });
console.log("clone complete");

And if we define a wrapper:

async function callAsync(args) {
    await lg.callMain(args);
    await new Promise((resolve) => { Asyncify.asyncFinalizers.push(() => { resolve();}); });
}

Then we can do:

console.log("cloning")
await callAsync(['clone','https://cors.isomorphic-git.org/github.com/torch2424/made-with-webassembly.git','clonedtest']);
console.log('Cloned..... no really');
FS.chdir('clonedtest');
lg.callMain(['log']);  // Not async so MUST use callMain()

@petersalomonsen What do you think - good enough?

Or we can add it to lg:

lg.callAsync = async (args) => {
    await lg.callMain(args);
    await new Promise((resolve) => { Asyncify.asyncFinalizers.push(() => { resolve();}); });
};

I wonder if there's a way to test if the function is async so we can just replace callMain() with a function wrapping the original.

Turns out this works. Seems best option for now:

lg.oldCallMain = lg.callMain
lg.callMain = async (args) => {
    await lg.oldCallMain(args);
    var runningAsync = typeof Asyncify === 'object' && Asyncify.currData;
    if (runningAsync) {
        await new Promise((resolve) => { Asyncify.asyncFinalizers.push(() => { resolve();}); });
    }
};

console.log("cloning")
await lg.callMain(['clone','https://cors.isomorphic-git.org/github.com/torch2424/made-with-webassembly.git','clonedtest']);
console.log('Cloned..... no really');
FS.chdir('clonedtest');
console.log("log - before")
await lg.callMain(['log']);
console.log("log - after")
petersalomonsen commented 3 years ago

ok with a wrapper, but I think if you make callMain async then it will break the existing worker use cases (cause you will require them to use await). Also most methods are not async, e.g. log does not do any http, and so it doesn't make sense to call it as async. If you want async autodetection you should probably return a promise in case it is async, instead of making the whole method async.

happybeing commented 3 years ago

I've pushed a version to my fork which replaces callMain() with a wrapper around the old function, so there's no special code required other than to await lg.callMain(...).

None of my changes will affect webworkers because the async features are in a separate build. There may be a way to support both in the same build but -s ASYNCIFY doubles the wasm size and has a performance cost, so I chose to have separate builds.

petersalomonsen commented 3 years ago

ok. yeah you're right, I forgot about that. For sure we don't want that double wasm size for the worker version, so separate builds are good.

happybeing commented 3 years ago

@petersalomonsen I've done my changes for now and been looking at setting up a test in wasm-git/test-browser-async using wasm-git/test-browser as a template. I am not sure how to get the window.lg2 object setup for the test.spec.js given I'm loading lg2.js in my index.html.

My index.html (which seems to be ignored although I have added it to karma.conf.js) contains the following:

<script src='lg2.js'></script>
<script>
window.lg2Ready = false;
window.lg2 = Module;
window.lg2ReadyPromise = new Promise((resolve) => {
    Module.onRuntimeInitialized = () => {
        window.lg2Ready = true;
        resolve(true);
    };
});
</script>

Then in my app I would do something like:

await window.lg2ReadyPromise;

const lg = window.lg2;
const FS = lg.FS;

const APPFS = FS.filesystems.MEMFS;
console.log("APPFS");
FS.mkdir('/working');

I don't want to spend a lot of time figuring out how to set up karma.conf.js, but if you can help me get the test.spec.js working so it has access to the lg2 object I'll write some tests.

petersalomonsen commented 3 years ago

Instead of trying to load your index.html I would try to load the script from javascript inside your asyncify.spec.js ( probably create a separate .spec.js file for the asyncify scenario? )

You can load a script from JS like this:

const scriptElement = document.createElement('script');
scriptElement.src = 'lg2-async.js';
scriptElement.onload = () => { 
  // start testing
};
document.documentElement.appendChild(scriptElement);
petersalomonsen commented 3 years ago

thanks for your contribution @happybeing . Your PR is merged now. Looking forward to see what you will create with wasm-git.

happybeing commented 3 years ago

Great collaborating with you @petersalomonsen, I learned quite a bit and hope someone will find the async feature useful.

I'm not sure I'll be using wasm-git yet, I'm still exploring options. I'm currently looking at WasmerJS/WASI as an alternative to Emscripten, but it's good to know I have the wasm-git option if I need it.