Open jacobtomlinson opened 6 years ago
What I want... A way to use the tools I know and love but 'leverage' the power of cloud when it's there.
So:
I want to manage my own environment but to also deploy workers on the cloud with the same environment.
I want to be able to work offline (but don't expect to be able to spin up large clusters obviously).
I want to be able to work with my own tools (like VS Code) to develop and debug apps that I can the easily deploy without having to think too much.
My files to be sync across local and cloud but in a way that 'just works'!
Some Ideas:
I think that if I want to easily use my own tools (e.g. VS Code) that precludes me from using docker?
To sync files FUSE could be an option. Alternatively simply using git or similar might be an option. This wouldn't be automatic (unless we watch
) but would give you the option of having ignored files, well defined sync behaviour different branches etc. I'd like cut out the middle man and not use github or similar. I wonder if there could just be two reposes with local having an origin of remote... humm more thinking needed. In fact some funky FUSE thing could even manage this for you !? everything is a nail!!!
I'm not sure if a VPN would do the trick our how that would work, will chat to @jacobtomlinson . An alternative is to have a remote scheduler (which you could expose), this would have some benefits and easier to expose but the downside of how to ensure it dies if the notebook (or process that created it) dies. Also could have multiple notebooks or processes using the same scheduler. Maybe even you can only have one scheduler and it's created by dockerhub and tied to your username. All your notebooks, pet projects, deployed apps etc. Umm... On the other hand this would scupper different clusters with different environments. I do like the idea of being able to create and share clusters though...
I don't think using docker precludes you from using your own tools, but it does complicate it. In theory you could connect the debugger and shell in VS Code into your container.
My preference would be code locally and data remotely (via FUSE). The story we keep discussing is "How would I use Pangeo on a train heading for a tunnel?". I like the idea that I could work on a notebook which would execute dask jobs in the cloud to touch the data. That way when I enter the tunnel I can continue coding and just have to wait until the tunnel is over to compute the dask graph.
VPNs are interesting but pushing the dask scheduler into the cluster may cover most of the requirements for this one.
You can run all of Pangeo locally using minikube, however you may wish to do part local and part cloud based. For example you may want your notebook locally but maintain the ability to create dask-kubernetes clusters in the cloud.
Requirements for running locally:
We could write a CLI application which checks for these dependencies, installs/configures them if they are missing and then starts the container.