Assembly translation sharing

xen2 commented 12 years ago

I wanted JSIL to be able to reuse translation results of common assemblies. (maybe there is already a way I didn't notice?)

Here is my current workflow/changes, please let me know if it seems good to you (in which case I could build a patch):

Make assembly ID unique (maybe reduced hash with 4 or 8 bytes so string is still not too long), so that $asm01 becomes $asmASSEMBLYHASH. As a result, it will never change between compilations or depending on actual dependencies.
Always output manifest-like definition at the top of every file (self is JSIL.DeclareAssembly as before, but new stuff would be a bunch of JSIL.GetAssembly for every referenced assembly).
Modified a little bit AssemblyTranslator to be able to translate multiple assemblies at once.
Add shared files through TranslationResult.AddFile so they go into manifest (still need to improve this step for automatic generation through given assembly with dependencies).

Any thoughts?

kg commented 12 years ago

I've done some thinking along these lines but haven't done any hacking.

I think assembly IDs would go away entirely since they aren't useful; you would probably just rely on the assembly collection sort of thing and each individual .js file would have a collection of assembly references at the top with file-local 'IDs' instead of the global IDs used now. That way you wouldn't have to worry about collisions or hashing and they could remain compact. Having them remain constant across compilations is nice to have, but I'm not sure you can actually guarantee it without collisions.

I think translating multiple assemblies at once is kind of a poor choice. What makes you believe that is necessary? The way things are factored right now makes this kind of reuse awkward, but it shouldn't require making AssemblyTranslator aware of multiple assemblies.

I think what is wanted here is a better concept of the manifest and the translation result, so that the 'manifest' is more like a list of references in .NET compiler terms, and then the translation result is one or more output .js files (from the input assembly), along with information on the .js files' dependencies (other assemblies they refer to, content embedded in the source assembly, etc). In that model, the manifest would be produced by JSILc, and each assembly translated would produce a translation result. The manifest bundles together all the individual translation results into one complete whole and is used to load an output application (sort of like now, but with less blurry boundaries).

On the other hand: what are the benefits you think would be provided by reuse of translated assemblies? So far, the only thing I can think of is saving time - it would let you shave a chunk of time off of each run of JSILc, especially if you're talking about translating the entire mono mscorlib. Are there other benefits I'm overlooking?

One major obstacle to any reuse scenario is that you need a way to enforce that the type information graph matches between the compilations. If the type information graph is out of sync (due to JSILc changes or changes to the involved assemblies), even if the various IDs haven't changed, the code generated will be incorrect and its behavior will be wrong. At present the amount of information contained within the type info graph is pretty significant. You could work around this by trying to make type information construction fully deterministic, but this would at a minimum require throwing out some of the parallelism support, and there are also some problems where underlying machinery (Cecil, CLR reflection) is not deterministic either.

xen2 commented 12 years ago

Thanks for the detailed answer. I agree about multiple assembly, it was merely a shortcut to avoid implementing stuff (esp. when I didn't have the current solution with assembly hash ID, I was still relying on manifest containing combined results of all assemblies, but probably not needed anymore now).

My scenario was more about having multiple separate assemblies that I want to use from javascript. As an example, I auto-generate serialization assemblies and want them to be included in the process, however they are not real references, same goes for plugins. Of course I could alternatively generate a fake assembly referencing all of them and throw it at JSIL...

kg commented 12 years ago

Right now the serialization assembly use case should work if you pass them all in on the command line as part of a JSILc invocation. JSILc will reuse a single manifest for all the assemblies it translates, so there will be one global set of assembly IDs, ensuring that when you load all the assemblies, everything works. It's not a perfect solution, but so far it's working for the purposes of loading XmlSerializer assemblies.

For plugins it gets hairier since it's not possible to compile all of them at once. I think that would definitely require this sort of reuse-friendly approach.

xen2 commented 12 years ago

BTW, one advantage I also found for global assembly ID is that it was more easy to copy paste skeleton ImplementExternals since $asmXXX wouldn't change over time, -- esp. if copying results from multiple assemblies, since ID could even conflict (but well, not that big a of a deal, search&replace was enough to do it manually).

kg commented 12 years ago

Yeah, the solution I use for that right now is just to rename the skeleton's '$asms' to something unique like '$fooasms' so that there's no collision. Global IDs would address that as well.

matra774 commented 12 years ago

Just my 2c: I have also been thinking about multiple compilation . Reason: I have a common c# assembly, that gets translated into JS and this result is then consumed by multiple applications. My idea was much simplier. Add a config setting ReferenceAssemblies, that contains ordered list of assemblies. The ordinal position of the assembly defined the assembly ID. Make sure that ReferenceAssemblies is contant across compilation and contains all used assemblies.

However, I am not sure fi this is enough (for exmaple do we also need to preserve method signatureIds across the compilations, and the type graph consistency mentioned by Kevin above).

For now, I just share the library DLLs between projects and translate the library DLL in each project. It only adds few seconds...

xen2 commented 12 years ago

I guess what I could do is:

Use that global ID only when generating skeleton (it's much more easier to copy paste), but otherwise disable it for normal translation.
Within a normal assembly translation, use $ref01 $ref02 for referenced assemblies (local ID that doesn't conflict with manifest)
For now I would keep manifest as is, but it could probably go away (not needed by assembly anymore)
Would probably need some mechanism to tell JSIL to stop translating on specific assemblies and their dependencies (maybe already possible, need to double check).

iskiselev commented 11 years ago

For me, assembly translation reuse is very important feature. I try to translate very big project. Every translation may take more then 10 minutes. So it's very uncomfortable to develop with such long compilation speed.

Maybe JSIL should generate local assembly variables for each assembly translation? So every assembly file will be wrapped in one anonymous function. Inside this function would be copy of that part of manifest file, that is used inside this assembly. In such case we can transform assembly one-by-one and it will give possibility for incremental build.

kg commented 11 years ago

Incremental translation/reuse are not a problem because of the format of the output JS. It is a problem because the actual generated JS depends on type information about the entire application. If one assembly changes and you only recompile the JS for that assembly, it may contain new assumptions about type information that do not match those elsewhere in the application.

You can think about it this way: if you had a DLL, written in C, that used a particular data structure, and you changed the size or order of fields in the data structure, you would need to recompile any EXEs that used that DLL as well - they wouldn't automatically be adapted to use the new structure layout. The same class of problems occur here. (I do fix up a lot of things at runtime in the browser; it's not possible to fix up all of them.)

How big is the application you're translating? 10 minutes is extreme; I've never seen anything higher than 30 seconds on my machine.

iskiselev commented 11 years ago

I understand, that if there is any change in assembly, we need to recompile this assembly and any assembly that reference it. At the same time, if there is no any assembly, that references changed assembly, we should be able to recompile only changed one. But if we try to do this, ignoring all other assembly now, recompiled assembly will have another id for referenced assemblies (I talk about id, that are written in manifest file). I try to port big business application (it consists from dozens of projects and dozens MB of source code). I have not plan to port all of it, but even sub scope take a lot of time to recompile. Maybe I will be able to short recompile time using more aggressive assembly ignore in settings (I really don't use all translated assemblies now).

ephere commented 11 years ago

One extra vote here for global assembly IDs and plugin support. I am trying to write code which loads an arbitrary JSIL-generated assembly manifest at any arbitrary moment and be able to use types within it (cast them to an interface defined within a commonly referenced assembly). So far it seems not to be possible in the moment.

kg commented 11 years ago

ephere: if you compile the plugin(s) as a part of the build of the main application, you should be able to load the main application without the plugin(s), and then later load an arbitrary plugin and use it. If that use case isn't working, it's definitely a bug and I can fix it.

EDIT: That is, the compile process produces per-executable (exe/dll) manifests, so you don't have to load everything JSILc builds in one go; you can split the loads up individually. The JSIL bootstrapper isn't necessarily designed for on-the-fly loading of additional assemblies, so if you run into big problems there it may be some work to fix that. But the rest of the runtime should not have an issue.

ephere commented 11 years ago

In my use case there is a single dependency in that plugins and application only share one common "interface" assembly, which is a common scenario when dealing with app extensions. I actually managed to dynamically load a JSIL-compiled plugin .js file inside my main app, cast it to a common interface, and perform some operations on it through that interface which seems to work fine.

However, I had to load the plugin's assembly .js file (and not its manifest) directly for it to register and I also changed the assembly token code from being enumerated to using a hash code of the assembly name. So, in AssemblyManifest.cs:61:

token.ID = i++; becomes token.ID = token.Assembly.GetHashCode();

There's, of course, a better way to do that but it seems to work OK for now. One test (TetrisRunsReplayWithoutErrors) started failing after this change but finding out why is still beyond me for now.

kg commented 11 years ago

TetrisRunsReplayWithoutErrors is a smoke test that runs in the cloud, so don't worry about that one.

It is good to know that you got the basics working. Manifest loads not working right after the first one is kind of expected, since I never tested that. And the token overlap is definitely a problem.

If the shared types between the assemblies are only interfaces, then you shouldn't hit any compiler problems - the need for global type information doesn't apply to interfaces, so you're safe.

If you can file bugs (test cases help, but aren't absolutely necessary) about each of those problems we can track them and ideally I can try and fix them.

ephere commented 11 years ago

Ok, I will keep experimenting. Being able to load an assembly manifest a second time would help a lot since, as it stands now, the only solution I see is parsing it manually and loading the assemblies included within one by one. I can file a separate issue for that, though maybe this one is a general enough to reflect it.

I'll have to go through some existing tests to try and figure out how to write new ones.

Btw, JSIL is an absolute gem and you're doing great work with it!

P.S. OverloadedGenericMethodSignatures2 test also fails for me, is that expected?

kg commented 11 years ago

Everything in the 'Tests' project should pass, with the exception of one threading test that has a race in it that is hard to fix (but it's in the threading test cases, so that should be fairly clear). There are a couple known problems with locales. If OverloadedGenericMethodSignatures2 broke that might mean that a change you made broke it, somehow.

Loading additional manifests after startup is definitely a feasible thing, I'd just have to figure out how best to expose that feature.

The easiest way to write new tests is just to write a simple stand-alone console application with a Program class and a static Main method, like all the ones in SimpleTestCases. The test runner will do the right thing with them (and if you drop them into that folder, they run automatically).

ephere commented 11 years ago

Ok, seems straight forward enough. OverloadedGenericMethodSignatures2 was failing even before I made my change so I don't think its related. The error it gives is:

test.Interface_Test2 was not direct-dispatched Expected: True But was: False

sq / JSIL

Assembly translation sharing #100