Compiling applications and assets to a single binary

alexwhitman commented 9 years ago

One of the talked about benefits of golang is that it compiles to a single binary. Projects such as nexe and jxcore allow compiling to a single binary but nexe isn't actively maintained and I've never made jxcore work consistently.

require and fs functions would need to be changed to know about packaged files and assets and I'm sure there's some other things that would need consideration.

What are your thoughts about being able to do this in a maintained and supported manner?

rvagg commented 9 years ago

/cc @bmeck

bmeck commented 9 years ago

@alexwhitman we have work underway w/ some oversight into the api and commitment from @trevnorris to help see this through. https://gist.github.com/bmeck/0deeefd070224c10566f

We will never emulate a virtual file system, and most bundling applications (Windows EXE format, Bundler, Jar files, etc.) take the same stance on emulating file systems. A presentation will be done at Empire Node with more information.

An old binary supporting this behavior is at http://bmeck.github.io/ , but the spec has changed since then.

junosuarez commented 9 years ago

I know @creationix had an idea around this a few years ago

sindresorhus commented 9 years ago

Mickael-van-der-Beek commented 9 years ago

If you have asynchronous require calls, you could almost use something like r.js or browserify to build a minified and optimised single file.

I say almost because of course native libraries or modules that have to be compiled or use C/C++ bindings are still an issue.

bmeck commented 9 years ago

@Mickael-van-der-Beek As stated in the gist linked, shared libraries will need to be dumped to disk in order to be loaded. Only Solaris supports loading a shared library from memory (Mac OS used to but deprecated the API). That is how jar files handle shared libraries in industry.

@sindresorhus I think the only thing that will be slightly different from the thoughts in that issue are that we implemented a resources API rather than making a new protocol.

martinies commented 9 years ago

I can't say I'm expert on this subject but we had lots of cases recently. At least 4 out of 10 tasks we have from customers mostly to show design, proof of concept, and those require to run on 'closed systems' during the presentations / POC etc. Node is a great way to combine things in-together but we had lots of troubles on client systems with packages, file system related differences.Before node, we were using PHP and things were even worse. I believe, there are many others like me would enjoy a packaging/compiling feature with a virtual file system (something similar to jxcore's)

@bmeck why no for virtual file system? eventually it worked for jxcore (after several months of struggling though but it works smooth for most cases now) The only problem is .node files. If the package has a .node file, we just write it to the disk on initial run from the virtual file system. Thankfully we don't use native modules regularly.

Some of our team members don't have cross platform development experience (difference in paths etc.), virtual file system helps us now to ease deployments especially when the customer test or server environment is unknown.

+1

Mickael-van-der-Beek commented 9 years ago

If a virtual file systems is an option, would Docker + Boot2Docker be a solution ?

bmeck commented 9 years ago

@martinies because filesystems work differently on different environments and bundled assets are read only. A bunch of things will have edge cases and randomly break, we tried that at the beginning of the year but too many problems with people needing to run stat() and expecting file system settings like case insensitivity to make it worth the problem.

Take a look at the formats that are also in the same mindset:

Windows EXEs ( http://msdn.microsoft.com/en-us/library/windows/desktop/ms648042(v=vs.85).aspx )
Java ( http://docs.oracle.com/javase/7/docs/api/java/lang/Class.html#getResource(java.lang.String) )
Apple ( https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSBundle_Class/ )
Ruby Gems ( extract it http://docs.seattlerb.org/rubygems/Gem.html#method-c-datadir )
Perl ( http://search.cpan.org/dist/PAR/lib/PAR/Tutorial.pod#Accessing_packed_files )

If you want to mount a virtual file system via fuse or using self extracting executables that would be fine, but static in memory assets are not files and treating them as such leads to misunderstandings and leaky abstractions: see ( http://www.py2exe.org/index.cgi/WorkingWithVariousPackagesAndModules ). However, when we tried doing self extracting executables there were serious cleanup problems ( same as noted in atom/atom-shell#251 )

We want the cases where those break to be obvious (so developers can learn to diagnose the actual problem), even if it means tweaking code a little.

@Mickael-van-der-Beek docker is a bit heavy for distribution and still has some problems like how would you run 2 node apps in the same container.

tl;dr We don't want to lie; lying makes fewer but much harder to deal w/ errors.

bnoordhuis commented 9 years ago

Only Solaris supports loading a shared library from memory (Mac OS used to but deprecated the API).

Linux does too, indirectly. You start a thread, the thread creates a pipe or socket pair, then the parent thread calls dlopen("/proc/self/fd/$fd") where $fd is the read end of the pipe. Should even work in secure ld.so mode.

kkoopa commented 9 years ago

Windoze supports it too. Nothing says you have to use the operating system's LoadLibrary routine. Just replicate it. This is how every other PE-"protector" works.

bmeck commented 9 years ago

@bnoordhuis had not thought of that, would require some work for the setup and teardown of the pair. @kkoopa technically we could implement our own library loader yes, but that is painful.

If the experience is the exact same everywhere either would work. Current implementation extracts to disk since that seems the normal solution across the board.

kkoopa commented 9 years ago

It's quite straight-forward actually. Anyway, why reinvent the wheel: https://github.com/fancycode/MemoryModule

creationix commented 9 years ago

I've been implementing the exact same feature for luvit recently. It's pretty trivial to append a filesystem to the main binary using zip format since most exe formats ignore extra data at the end and zip format ignores data at the beginning (hence how self-extracting zips work)

I've been fighting the dlopen issue. If it's so easy to write code to load a module from memory, then why do jars write out to disk as was mentioned above?

My new work is https://github.com/luvit/luvi

creationix commented 9 years ago

Also, responding to the original post. I would not recommend patching fs to load from the vfs. It's quite a different beast and should have a different API I think. I do recommend patching require to look there in certain cases. For mine, that is for bootstrapping a single-binary app and for relative requires from files already in the vfs.

alexwhitman commented 9 years ago

Good to see that this has kicked off some discussion.

@creationix The reason I mentioned fs is so that, ideally, files that make up the application can be loaded when both packed and unpacked. For example, I might want to load a template file for rendering. During development I'd want to load that from the regular file system but when deployed I'd want it loaded from the binary. That way I wouldn't have to build the binary for each small change during development.

bmeck commented 9 years ago

@creationix from what I can tell it is that JAR file conventions rely on acting the same in many environments, and implementing code loaders do not get 100% parity with OS dlopen(). We were using archives as namespaces which is how require() would work inside of them. If you wanted to grab files inside of the archive we use a read only createResourceReadStream(path, opts) -> stream / readResource(path, opt, cb). I would be interested in talking things over with you during a hangout if you have any problems because our implementation seems to work fine.

@alexwhitman in the spec and implementation posted we do allow loading from disk or inside an archive via a concept called resources. These are present in most application programming environments with archive files. The important thing to note is that the concept of resources is not tied to archives themselves. If we emulate fs ever, we are encouraging that can of worms in other situations, like mounting a remote FTP where all the Sync functions would suddenly not make any sense. Is there any problem with using a more abstract API without references to stat() and inodes etc.?

trevnorris commented 9 years ago

Re: Virtualized File System

Not going to happen. Has far too large a footprint on core code, too many unknowns to deal with and an overall general PITA. I think @indutny properly stated it on IRC:

<indutny> oh god
<indutny> no no no

zcbenz commented 9 years ago

Hi, I want to share my experience on implementing app packaging in atom-shell, which I hope would be helpful for Node.

Introduction

The app packaging in atom-shell works by modifying node's fs module (and others like child_process) to recognize asar archives, and treat /path/to/*.asar as a directory. In general, it is a virtual filesystem compatible with current Node's APIs.

Examples of uses:

require('./test.asar/main.js');
fs.readFileSync('./test.asar/REAME.md');
fs.readdirSync('./test.asar');
child_process.fork('./test.asar/task.js');

Archive format

The packages use asar archives, which is a custom archive format targets for fast random access. I didn't use Zip because it is both over complicated for our case and lack of some core features we need. I had listed our requirements and comparisons of different archive types here, and the conclusion was developing our own archive type was a better choice.

Pros and cons

I don't know the number of users using the app packaging of atom-shell, it works for most current node apps without modifying one line of source code. Though there are still some limitations, and I had listed them in atom-shell's wiki.

Implementation

If you have skimmed the structure of asar archive format, you should know that the implementation would be very simple. You can find out how Node's APIs are overloaded in asar.coffee, which only has 300 lines, and the native asar format parsing code can be found in archive.cc, which is 250 lines.

Single binary

The asar format doesn't support being concatenated to binaries like Zip, but it would be quite easy to do by putting the size of asar archive at the end of the archive file.

Thoughts for node

I don't think adding new APIs for packaging is a solution. If fs can not read a file in archive with current APIs, nearly all modules that read from filesystem would break when used in archives.

Image a user who wants to compile an existing Express app into a single binary, he would has tons of code (including his own and third party modules) to change to make it work.

bmeck commented 9 years ago

@zcbenz Zip archive's support symbolic links through the external file attributes. The fast lookup is generally only a minor problem since we do cache the central directory in memory.

indutny commented 9 years ago

I think technically it could be possible to combine two ELF files (or two Mach files) into one single file. Making it load all symbols and relocate all data.

pmuellr commented 9 years ago

Another kinda wacky thing, for "single binary" deployments, would be to create and use snapshots. Briefly alluded to here: https://developers.google.com/v8/embed#contexts . Anyone use these in practice?

This is just for code; non-code resources (data files) would need a traditional archive story.

We built something like this many years ago for IBM's J9 Java VM. Compiled Java .class files to an optimized, quicker-to-load format that the VM could consume. It worked, provided some interesting value in some situations, but in the end was a little too complex for most people to easily use. It only potentially improves the runtime startup, and often at the expense of a larger disk footprint for the archive. Our target was mobile devices, which is where you get the most bang for the buck with this kind of story.

We had a zip-based archive story that included the "snapshot" with a well-known file name, and resources (aligned on 4-byte boundaries), so in the end you had a single file. Kinda similar to Android's APK story.

Would be an interesting area to explore, but obviously not a critical path item. :-)

On Tue, Oct 21, 2014 at 9:06 AM, Fedor Indutny notifications@github.com wrote:

I think technically it could be possible to combine two ELF files (or two Mach files) into one single file. Making it load all symbols and relocate all data.

— Reply to this email directly or view it on GitHub https://github.com/node-forward/discussions/issues/10#issuecomment-59923862 .

Patrick Mueller http://muellerware.org

bmeck commented 9 years ago

@pmuellr I would have concerns about vendor lock in if we do use snapshotting, loading the code would be faster but would not be portable.

pmuellr commented 9 years ago

ya, there's lots to worry about if you want to do this; platform specificity is of course a concern as well. Not so sure about "vendor" lock-in, but certainly "version" lock-in is another real concern.

You'll notice I didn't really paint this as a happy outcome of the work we did in Java. :-) The tooling for this kind of thing tends to be ... complicated. I don't believe we expose this functionality or tooling in the product anymore.

Also true that as processor speeds increase, the benefits of this kind of approach decrease.

Still, snapshots do seem to be an interesting idea, and perhaps there's other benefits to them like "obfuscated code" that would be of value to someone. I like to keep weird things like this in the mix of wacky ideas - sometimes you find these things useful in unexpected ways.

On Sun, Oct 26, 2014 at 6:12 PM, Bradley Meck notifications@github.com wrote:

@pmuellr https://github.com/pmuellr I would have concerns about vendor lock in if we do use snapshotting, loading the code would be faster but would not be portable.

— Reply to this email directly or view it on GitHub https://github.com/node-forward/discussions/issues/10#issuecomment-60540869 .

Patrick Mueller http://muellerware.org

bnoordhuis commented 9 years ago

Another kinda wacky thing, for "single binary" deployments, would be to create and use snapshots. Briefly alluded to here: https://developers.google.com/v8/embed#contexts . Anyone use these in practice?

node-webkit does, I think. There is a provision for it in V8's mksnapshot tool: you can make it load extra code with the (aptly named) --extra_code <filename> switch.

It's not very amenable to general purpose code, however. For example, snapshots with "foreign" objects don't work; that means you can't use buffers or handles and those are rather pervasive in node.

bmeck commented 9 years ago

Please see: https://www.youtube.com/watch?v=k5r0kQlsDgU

And: https://github.com/bmeck/noda-loader

imlucas commented 9 years ago

I also took a stab at this recently that might be helpful for other folks interested.

method

Using the löve/node-webkit self-extracting zip "single binary" approach, which is basically these 10 steps.

`mksnapshot`

The node-webkit docs @pmuellr mentioned detail all of the weird/gnarly tradeoffs you have to deal with.

auto-update

For python, esky has really elegant way of handling this. Some prototypes for this that are promising, but real need hasn't come up (deployment scripts just wget from github releases and stomp the local copy).

status

In production and "good enough for me" so haven't fiddled in a while. windows binary add-ons are back and forth. There are real-live tests though if anyone has interest.

conclusions

lot's of hard, thankless problems around compat
windows build box availability :( hopefully this will change as mapbox/node-pre-gyp and node-forward/build progress though
Python has taken a lot of stabs at this (py2exe, esky, rumps, etc) that should be throughly researched
scripting and infrastructure work no one wants to tackle
it's really fun to email apps/scripts to customers and just skip the whole "now you install node and there's this thing called npm" dance

node-forward / discussions