jupyterhub / repo2docker

Turn repositories into Jupyter-enabled Docker images
https://repo2docker.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.62k stars 361 forks source link

Add support for packagcompiler for julia support #686

Open jlperla opened 5 years ago

jlperla commented 5 years ago

cc: @davidanthoff @arnavs This is a specific implementation of the discussion in #601

We have now discussed how best to annotate project files for packagecompiler, which is a prerequisite for getting immediate support in repo2docker

Background Right now, repo2docker doesn't support the package compiler but:

Extension to the Project File After discussion with maintainers of Julia and the package manager (cc: @KristofferC @aviks) the proposal was to standardize on a section such as the following inside of the Project.toml files:

[deps]
Markdown = "XXXXXXXXXXX"
Random = "XXXXXXX"

[packagecompiler]
compilerflags = "-O3"
packages = ["Markdown", "Random"]

Note that this is not repo2docker specific, and could be used by other backend systems for packagecompile'ing based on a Project and Manifest. It would be hand-modified in the Project.toml.

@SimonDanisch any thoughts on this including the naming or additional options to be able to pass to the compile_incremental?

Implementation of PackageCompiler Support

Now, to implement support repo2docker, we can produce a PR which does the following:

betatim commented 5 years ago

Sounds like a good proposal and I think the next step here is to create a (exploratory if needed) PR to implement this.

I think we should make sure to link to or have in the docs advice on when/why/which packages one wants to pre-compile and what the trade-off is (slower build time but more snappy interactive first use, I think).

A question for my education: I had a short look at the Dockerfile and don't understand why you can't do this in postBuild. Could you point out what exactly it is?

jlperla commented 5 years ago

For sure, we would need the docs on tradeoffs. One tradeoff I am not sure of: is there a big difference on the speed of spinning up the hub container when the image gets larger (say twice as large) or is it a 2nd order effect until the images get huge?

A question for my education: I had a short look at the Dockerfile and don't understand why you can't do this in postBuild. Could you point out what exactly it is?

Yeah, that was our thought as well. @arnavs spent a huge amount of time on it after discussing with @davidanthoff and couldn't figure it out. It might have been a particular set of operations specific to the Pkg manager in julia (among them activation of the jupyter kernels) nand it is possible that there is a specific order or operations that would work. So, I should probably change the " It is not possible to do this with a postBuild" to " We gave up trying to get it done with a postBuild"

betatim commented 5 years ago

For sure, we would need the docs on tradeoffs. One tradeoff I am not sure of: is there a big difference on the speed of spinning up the hub container when the image gets larger (say twice as large) or is it a 2nd order effect until the images get huge?

The main use-case for repo2docker is with BinderHub. There we have to frequently transfer the image from the docker registry to the node on which it is being executed (basically docker pull). My guess (because we don't have data only manual inspection) is that starting the container once it is local is fast (a few seconds) but transferring 2GB over the network takes longer (10s of seconds??) so doubling the image size means (approximately) doubling the time users wait for their launch.

There is some missing factor in there for (very) popular repos where all nodes in the cluster already have a copy of the container image and when you do have to pull it how many of the layers of the image are shared with other images already present. However I'd treat that as the exception and assume that each launch needs to transfer the majority of the image and that image transfer is the biggest part of the wait -> bigger images mean slower start times.

jlperla commented 5 years ago

The issue I have for documenting this is that I am not sure how to instruct people on what the size of the repo is. Is there a trick for beginners?

For example, in the "Build logs" where it says "Found built image, launching..." it would be great it it said "Found built image of size XXX, launching...". Is that feasible?

betatim commented 5 years ago

Showing the size (and maybe even the progress of transfer) should be possible, I made a new issue in the BinderHub repo.

Maybe in the docs we can say what we already say "bigger images means longer launch times" and some estimate of how much extra space compiling a package will take. Something like "Compiling a package will typically use N times more space than not compiling it. This means you image will be larger and so slower to launch."?

arnavs commented 5 years ago

A question for my education: I had a short look at the Dockerfile and don't understand why you can't do this in postBuild. Could you point out what exactly it is?

We tried this a few different ways (i.e., with a few different postBuild setups) in this repo: https://github.com/arnavs/compiled-binder-postbuild. You can spin up binder containers for individual commits to see the output from each approach.

I don't know that there was a single thing that failed, as much as a smorgasbord of errors (sometimes packages weren't found, sometimes we couldn't bake them, sometimes the baking had no effect, etc.)

Hopefully this data helps. In the meantime, I'll chug ahead with the approach/PR mentioned above.

arnavs commented 5 years ago

OK, I've made a fork of this project that will:

A) Digest the [packagecompiler] bit of TOML, and feed it to the build script (which specializes based on whether we have things to bake, and/or compiler flags.

B) Download PackageCompiler, add all the packages to the default environment (looks like this matters), do some regex jujitsu on the list of things to bake, and fire up the oven.

I'm hitting a (temporary) wall with the end-to-end testing, due to some kind of API rate limit. But so far so good. The only errors I've seen are the same sorts of things PackageCompiler complains about on the local machine.

betatim commented 5 years ago

Feel free to open a PR with [WIP] in the title so others can see what you are doing. It makes it easier to give tips when things get stuck, etc

davidanthoff commented 5 years ago

Alright, I finally looked into this, sorry that it took so long. I spent a fair bit of time with PackageCompiler.jl internals now and added custom sysimage support to the VS Code Julia extension, so I'm somewhat familiar with the tech stack now.

I think, to be honest, that PackageCompiler.jl at this point is a) too fragile to be used here and b) I'm not sold that the approach taken in PackageCompiler.jl is ideal for something like repo2docker.

So I think my recommendation would be that we don't merge something like https://github.com/jupyter/repo2docker/pull/688/ at this point. I think my objection (a) might just go away with time, as PackageCompiler.jl gets more solid, but I think (b) is probably a more fundamental issue.

Some more details on (a): right now PackageCompiler.jl doesn't have a released version that works, there are a bunch of PRs pending that rewrite things quite dramatically, it generally doesn't work well with custom environments and the documentation is not there to really grasp what is going on (although I now understand the code well enough that at least for me that is not an issue). I think that is all quite normal for such a young package that is in the middle of sorting things out, so no criticism from my end there, but I do think that repo2docker shouldn't take a dependency on something that is in such phase of its life cycle.

My more fundamental issue (b) is this. PackageCompiler.jl right now really does two things: 1) it "snoops" on the code running in a package during tests or some other code and uses that to come up with a list of precompile statements that define which methods get compiled into the custom sysimage, and 2) it compiles the custom sysimage. Step (1) seems to be the fragile one, and that is why essentially the whole thing is only usable right now via a whitelist approach. I think generally that is too complicated for a general audience story like binder. I think a much better approach might be to rely on package authors to include a useful set of precompile statements in their packages and not try to construct that list as part of the binder compile at all. And then just try to compile a custom sysimage the includes all the packages in the environment, not a custom list. My understanding is that because at this point we are not including the snoop phase, things should be much more stable, and we hopefully can just enable this by default, without any configuration. The downside is that we will get some speedup from this (especially when it comes to package load times), but not all the speedups that one might see with the full PackageCompiler snoop (if it works). The solution to this of course is that package authors snoop their packages during dev and include the relevant precompile statements in their package by default. I think generally that is a much better approach: don't let the end users deal with all of this low level stuff, but instead encourage package authors to do the hard work, and then end users will just benefit from that.

I think I'm really not a fan of any configuration settings for this. I think the lifetime of those will be too short. I think it is quite realistic to assume that the core julia team will start to tackle compile time issues during the next year, and at that point it seems highly likely that whatever custom config settings we came up with here would be out-dated. I think if we do add something before this is all solved in julia base (and I am in favor of trying!), then it would have to be without config, by default, and robust.

I might take a crack at this sometime soon. Right now I remember most of the details from doing this for VS Code, so it would be good to not let that slip out of memory ;)

jlperla commented 5 years ago

I think it is quite realistic to assume that the core julia team will start to tackle compile time issues during the next year, and at that point it seems highly likely that whatever custom config settings we came up with here would be out-dated.

I am very skeptical of that. I thought the same thing a year ago. There are just too many priorities and not enough people.

But, just to be clear, the "custom settings" were discussed with the core julia team, with the intention that it would not be a custom settings just for this. i.e. it would be something that others would use for similar whitelisting time scenarios for package compiler.

I think I'm really not a fan of any configuration settings for this. I think the lifetime of those will be too short.

A year? Two? I am also 100% sure that you will at very least need to "blacklist" packages for any snooping solution in the forseeable future with Julia. There are just too many things that can go wrong with package compilation.

arnavs commented 5 years ago

I think criticism (a) is pretty accurate (if you noticed, in the PR we are using a specific branch of PackageCompiler.jl, which seems to be the most stable.) But you're probably right that it will go away with time (i.e., maybe in a few months we would stop using a branch.)

As for (b), here are my thoughts:

And then just try to compile a custom sysimage the includes all the packages in the environment, not a custom list.

One issue with this is the trade-off between the size of a system image and the fixed cost of loading Julia from that image. Someone explained to me on Slack that it's some (nonlinear) function, and this would part a hard constraint on the Project.toml size.

PackageCompiler.jl right now really does two things: 1) it "snoops" on the code running in a package during tests or some other code and uses that to come up with a list of precompile statements

FWIW, the main error we've seen with this process is that it doesn't install the right test dependencies, and not that the actual snooping fails. Dunno, maybe @SimonDanisch can comment?

rely on package authors to include a useful set of precompile statements in their packages and not try to construct that list as part of the binder compile at all.

I think this would be great, but probably a huge change for many packages (?). I'm not familiar with how packages are actually precompiled, but at least for the ones we've written, I didn't do anything special.

arnavs commented 5 years ago

Another thing: the compile_incremental feature of PackageCompiler.jl could be useful for "graceful failure." In other words, given a list of things to precompile (which is some subset of the overall Package/Manifest), step through and essentially try/catch each one. So maybe we could do without a hard blacklist.

betatim commented 5 years ago

Thanks a lot everyone for your thoughts, work and patience! For me this is an example of why working in a way where we can cheaply (in terms of effort) try things out without then feeling like we invested so much we better merge it is a good thing.


A point of view from repo2docker in general: it is clear from user feedback that faster build and startup times (of the instance on mybinder.org not the code itself) are what users feel as a pain right now. For repo2docker this means making the images we produce smaller (transferring images from our registry to nodes in the cluster takes longer for bigger images) and doing less work during the build.

jlperla commented 5 years ago

I think that before everyone says that binder support is good enough for now, and that PackageCompiler is not worth the trouble, they should try a repository without it that does something any plotting with the primary plotting libraries,differential equations, or pretty much anything nontrivial.

I think that binder is essentially unusable with Julia in practice. Waiting a minute for an image to come down and then another one two minutes for plotting library to load and the first plot to appear... I would love to think that this will be magically fixed in the next year, but there is almost no chance.

betatim commented 5 years ago

I think that before everyone says that binder support is good enough for now, and that PackageCompiler is not worth the trouble

What do you propose as a next step? It sounds like PackageCompiler.jl is out for the moment or would need a lot of modifications to its code before it becomes usable here.

jlperla commented 5 years ago

What do you propose as a next step? It sounds like PackageCompiler.jl is out for the moment or would need a lot of modifications to its code before it becomes usable here.

No, that is not correct. PackageCompiler works great with a whitelisted set of packages. Hence the reason this PR was proposed, and why I asked the Julia core developers about a reasonable place to put whitelists in the project files.

My guess is that PackageCompiler working auto-magically with arbitrary packages (which is what @davidanthoff suggested would deliver a configuration-free setup) will likely not happen for years - if ever. I think a "interpreter" mode of Julia is likely to happen before that. Even if it did, it is unclear if that is the best approach: compiling every package used in a project into the julia image would significantly increase its size and it might make things worse in a very nonlinear way.

davidanthoff commented 5 years ago

I think in the short run PackageCompiler.jl is out. We can't merge a PR here that takes a dependency on an unreleased branch of a package, and I also think we shouldn't take a dependency on a package that is in the middle of another major rewrite (which is how I interpret the currently pending PRs over there, but I might be wrong about that). I also think that things like env handling need to be sorted out in PackageCompiler before it is ready for prime time use in something like binder. Also, things like tests failing, no code coverage info available etc. are fine while a package is in dev mode, but I think binder is just part of too much basic infrastructure at this point to take a dependency on something like that.

I'm experimenting around with a solution where one just adds a using PackageName for all packages to a custom sysimage, and so far that looks fairly promising. The sys image size for my default environment (which has a lot of packages in it) increased from 160 MB to 260 MB, which seems not unreasonable to me. I am not sure, but I also think that maybe juliabox uses that strategy? So I don't think we need to wait for the core julia devs to tackle this, at least that is my impression right now.

jlperla commented 5 years ago

I think in the short run PackageCompiler.jl is out. We can't merge a PR here that takes a dependency on an unreleased branch of a package, and I also think we shouldn't take a dependency on a package that is in the middle of another major rewrite (which is how I interpret the currently pending PRs over there, but I might be wrong about that)

That is for sure, and I didn't notice the packagecompiler branch. A few things:

I'm experimenting around with a solution where one just adds a using PackageName for all packages to a custom sysimage, and so far that looks fairly promising.

If it works, great. @arnavs can help you through those issues. I care about getting packagecompiler support working, not the specifics of the implementation, because I think mybinder is effectively useless for Julia until that point.

I am not sure, but I also think that maybe juliabox uses that strategy? So I don't think we need to wait for the core julia devs to tackle this, at least that is my impression right now.

I agree, we can't wait for them, they have too many higher priority items to work on that effect more users:

aviks commented 5 years ago

So a few comments in general

jlperla commented 5 years ago

environment, or an application, but not on a development environment, which is what I consider binder to be. NextJournal has different priorities. Just IMO, so this may be debatable.

Thanks Avik, that is very helpful. I think you are right and make an excellent point. PackageCompiler is best for the interactive notebook publishing scenarios, which is where my main use cases were. For the other mybinder use cases, which are also important to me, it is crucial to give people much more flexibility than PackageCompiler allows... Even if it worked for every package.

The precompile is already done in the setup script, so the speed issues are in addition to that. With a development environment, people are willing to wait a few minutes because they don't expect it to be fast. With notebook publishing on the web, which is a key use case, that extra few minutes of time waiting with the using and first plot for a given page mismatches expectations.

davidanthoff commented 5 years ago

Yes, thanks @aviks, this is incredibly helpful!

Did juliabox at some point have a custom sysimage? Or did I completely mix that up?

I was not aware that with a custom sysimage, one can't update packages anymore. I had thought that if I include a package in a custom sysimage, and then do a pkg> up that updates that package, I would lose the benefit of the precompiled stuff, but that my environment would then load the new, updated version of the package. But if I understand things right now, the sysimage version of the package would then essentially take precedence over the version of package in the environment, even if the environment has a newer package version... If that is so, it is a real bummer...

So maybe all of that does speak in favor of an opt-in approach, at least once PackageCompiler.jl is more stable...

I have to admit that I'm not super happy with any of the options here (including not doing anything)...

davidanthoff commented 5 years ago

One more thought: I think we should maybe not yet give up on postBuild.

I think, now that I understand the current state of PackageCompiler.jl better, it is actually not too surprising that we couldn't get that to work: PackageCompiler.jl currently really doesn't seem to work with anything but the default julia environment, and of course here in the repo2docker story we use a custom project at the core of the design. So that can't work out.

But, if PackageCompiler.jl worked properly with julia environments, it really should all just work, or at least I don't really see a reason why not. So maybe it is worth trying to fix the julia env story in PackageCompiler.jl, and then try the postBuild approach again?

arnavs commented 5 years ago

OK, some thoughts:

@arnavs is that non-master branch necessary?

Yeah, pretty sure the sd-notomls branch of PackageCompiler.jl still has valuable content that isn't merged into master (but it's the only one; see e.g. @SimonDanisch's comment in https://github.com/JuliaLang/PackageCompiler.jl/issues/245.)

PackageCompiler.jl currently really doesn't seem to work with anything but the default julia environment, and of course here in the repo2docker story we use a custom project at the core of the design. So that can't work out.

I'm not sure about this... in playing around, the idea was to do the PackageCompiler prebaking at the start of the assembly script (i.e., install some deps, build a new system image, and swap.) And then, using the new system image, set up the custom repo2docker project.

So maybe it is worth trying to fix the julia env story in PackageCompiler.jl, and then try the postBuild approach again?

The problems I saw weren't env-related, but more so: snooping (it has a rough time picking up test dependencies, and then crashes when it tries to snoop tests), permissions (with my old PR, the proximal issue was that we could assemble the new system image, but couldn't relink /usr/bin/julia), and general robustness.

I think we should maybe not yet give up on postBuild.

It was hard to do this from inside of a generic postbuild script, mainly because of permissions.

Note that we could also provide small, "manual" snoopfiles for the "Major League" of packages. For example, even if we wrote our own minimal suite for Plots.jl and DataFrames.jl, that would probably make it more usable for most people. We could let people choose in postBuild whether or not to accept this speed-upgradability tradeoff.

davidanthoff commented 5 years ago

Yeah, the snooping phase is just buggy right now in PackageCompiler.jl. Whether that is because it doesn't handle env well, or because it misses test dependencies, I think the conclusion for us here is the same: that is stuff that needs to be sorted out in PackageCompiler.jl. But I don't see why any of this would work better or worse in postBuild than a solution where we build PackageCompiler.jl int repo2docker itself.

in playing around, the idea was to do the PackageCompiler prebaking at the start of the assembly script (i.e., install some deps, build a new system image, and swap.) And then, using the new system image, set up the custom repo2docker project.

I'm not sure I fully understand that. My understanding is that after everything is finished that we have in repo2docker right now, one would run incremental_compile to create a new sysimage?

permissions (with my old PR, the proximal issue was that we could assemble the new system image, but couldn't relink /usr/bin/julia)

Ah, so the problem is that the postBuild script doesn't have permissions to overwrite the original sysimage? So maybe it would be enough to change the permissions of the original sysimage in our assembly code, and then the postBuild script could work? I guess the alternative would be to modify the IJulia kernel spec to start julia with the -J option that points to the custom sysimage that is just stored in a different location?

, and general robustness.

Yeah, that is my general worry :) I think it would be really great if we could get a good understanding of why (or if) this would be better if this was built into repo2docker vs in a postBuild script.

arnavs commented 5 years ago

the conclusion for us here is the same: that is stuff that needs to be sorted out in PackageCompiler.jl

Sounds about right.

I'm not sure I fully understand that. My understanding is that after everything is finished that we have in repo2docker right now, one would run incremental_compile to create a new sysimage?

The idea is to flip this. So, let's say I give you TOML with 10 packages, 2 of which we want to bake in (even if PackageCompiler were perfect, we wouldn't want to just bake in everything, because there's a fixed cost of loading the image which grows with the size.) The steps in our Dockerfile are:

  1. In the v1.1 environment, add those 2 packages and call compile_package(replace = true).

  2. As before, inside Julia (which is now based on a different sysimg, which plays in all environments), instantiate the existing project file.

    Regardless of whether or not there's a Manifest, the instantiate step won't mess with the 2 packages we baked in, and will just add the remaining 8 into ~/.julia/packages.

So maybe it would be enough to change the permissions...

Yeah, that sounds right to me. Running IJulia with -J also sounds plausible; didn't give that a try.

I think it would be really great if we could get a good understanding of why (or if) this would be better if this was built into repo2docker vs in a postBuild script.

That's a good point. Here's why I think repo2docker might make sense:

  1. Regardless of PackageCompiler's stability, it will never be able to bundle arbitrary sets of packages together. The ability to lock down ("whitelist") those combinations only makes sense inside the main repo. (And we will probably need to do other duct-taping, as well, such as e.g. providing manual snoopfiles.)

  2. Replacing the system image "early on" means that other packages will "just work." We'd have to do more rounds of testing to see what would happen if, e.g., we have a Julia install, add IJulia, and then swap out system images. Should be fine, but...

  3. By the time postBuild is executed, the user is already in a custom project. I tried it with ] activate first, and it didn't work. Maybe we could get it to, though.

davidanthoff commented 5 years ago

The idea is to flip this.

Hm, to be honest, I'm not sure that is a good idea. If you add those two packages to the v1.1 env, it will ignore the complete information for all the upstream dependencies that is in the Manifest.toml. So now you've essentially built a sysimg that might have all the wrong versions of all the dependencies in it that the packages that you whitelisted need. It also just seems needlessly complicated (with the caveat, that I fully understand that it made sense for you at this point to go this route because that is pretty much what you were able to do with the existing PackageCompiler.jl version). It seems to me that a much cleaner version would be to build the sysimage in a julia instance that has the env loaded that the user actually wants to use.

I think all the issues you raise why it would be beneficial to have this in repo2docker rather than postBuild to me sound like limitations in PackageCompiler.jl that one/could fix there?

I hope I don't come across as too negative :) But I do think in this world, with PackageCompiler.jl being heavily developed, undergoing change and not supporting all scenarios we would like it to support, it would just be much better if we could avoid taking a hard dependency in repo2docker on it, but instead provided a hook (like postBuild) that enables users to use PackageCompiler.jl without too much difficulty. Purely from a dependency stability point of view, that seems reasonable to me (in the sense that repo2docker is not in an experimental, churn phase of its life, but rather powers lots and lots of production systems, so it probably needs to be quite conservative with its dependency choices).

jlperla commented 5 years ago

Hm, to be honest, I'm not sure that is a good idea

I agree. That all sounds like a maintenance nightmare. I think it is either simple and maintainable, or something is wrong. Having a few extra lines in a preexisting config file is reasonable, but this sort of thing isn't.

But I do think in this world, with PackageCompiler.jl being heavily developed, undergoing change and not supporting all scenarios we would like it to support, it would just be much better if we could avoid taking a hard dependency in repo2docker

The key question to me, then, is how stable is PackageCompiler in practice, and is the lack of a release just a technicality that is easily fixed by doing a release? That is a discussion to have with Simon.

Note that if someone doesn't add in a whitelisted list, then packagecompiler wouldn't even be called, be a dependency, or used at all (see https://github.com/jupyter/repo2docker/compare/master...arnavs:master#diff-48c5818bd6ad82366966df5c27dc1cefR144)

Also, it would make sense to have a particular PackageCompiler version fixed in that file (although a released on for sure!) So PackageCompiler stability shouldn't be a problem if there a released version we are happy with.

Purely from a dependency stability point of view, that seems reasonable to me (in the sense that repo2docker is not in an experimental, churn phase of its life, but rather powers lots and lots of production systems, so it probably needs to be quite conservative with its dependency choices).

Of course, but are the Julia things really in production? As discussed, what is suggested is an opt-in where PackageCompiler would only be initiated if someone specifically puts a section in their projects TOML file.

But on the production side, I just want to reiterate my feeling on this: without PackageCompiler support, Julia support in repo2docker may not be useful. I think it may put both Julia and mybinder under an especially unflattering light if people try the current setup. It simply doesn't match their expectations of interactive responsiveness of a webpage (although it is reasonable for mybinder as a "development environemnt" where expectations for rapid launching are lower)..

Here is a test: I added in a test repository which just runs a standard example from differentialequations and displays a dataframe.

Go to https://mybinder.org/v2/gh/jlperla/bindertest/master?filepath=test.ipynb

My timings were (after an initial 13 minute build, which seemed reasonable):

Now compare that 2 minutes to the couple of seconds execution time when using Arnav's version of a similar setup with packagecompiler built into the image: https://mybinder.org/v2/gh/arnavs/compiled-binder-example/v0.2.1?filepath=notebooks/demo.ipynb

davidanthoff commented 5 years ago

I got the whole thing to work with postBuild. Or rather, sort of, PackageCompiler.jl errors are lot. But I think the remaining issues are all simply PackageCompiler.jl bugs/issues that need to be fixed there. From the repo2docker point of view I think we can get all of this to work with postBuild and don't need to add any custom support.

Here is the repo: https://github.com/davidanthoff/bindertest/tree/postbuild

If you run that in mybinder.org, you can see that @time using VegaDatasets now is super fast. That was the only package I managed to compile into the sysimg, I think mainly because it actually doesn't have any test dependencies (which seem to choke up PackageCompiler.jl).

I think the issues that need to be addressed in PackageCompiler.jl are:

davidanthoff commented 5 years ago

Here is another example that uses a different tactic:

https://github.com/davidanthoff/teaching-2019-aere-workshop/tree/sysimg

In that example I'm skipping the whole snooping phase of PackageCompiler.jl. On the upside, it is much more robust, no errors from PackageCompiler.jl at all, and I'm baking a lot of packages into the image. On the downside, it currently only helps with package load times. I think that could be overcome if the packages themselves included appropriate precompile statements. In that model the snooping is done by the package author at dev time. That generally seems like a better approach to me.

jlperla commented 5 years ago

In that example I'm skipping the whole snooping phase of PackageCompiler.jl. On the upside, it is much more robust, no errors from PackageCompiler.jl at all, and I'm baking a lot of packages into the image. On the downside, it currently only helps with package load times. I think that could be overcome if the packages themselves included appropriate precompile statements. In that model the snooping is done by the package author at dev time. That generally seems like a better approach to me.

I am not sure that works,though. It requires getting the package writers to modify how they work just for package compiler... And the tradeoff may be that you t increases the using time for other scenarios

davidanthoff commented 5 years ago

Yeah, I think we'll just need to figure out what the effect of more precompile statements in packages is, not clear to me. The incentives to do that should increase, though, because I'm also including support for this kind of story in the VS Code extension, so now those precompile statements would benefit both repo2docker and VS Code users.

But having said that, this is just one option. I don't see any reason why the snooping couldn't be fixed in PackageCompiler.jl, and then one can do either one from a postBuild script.

jlperla commented 5 years ago

Ok, so it sounds like the key is to figure out issues with snooping and what is different with the postbuild vs the docker based (where @arnavs didn't have any problems)