thieman / dagobah

Simple DAG-based job scheduler in Python
Do What The F*ck You Want To Public License
755 stars 159 forks source link

Add Authentication #5

Closed MichaelMartinez closed 11 years ago

MichaelMartinez commented 11 years ago

This has the makings of a damn fine web app, authentication of some kind is mandatory if you want it to survive for more than one minute "out there"... winter is always coming.

thieman commented 11 years ago

@MichaelMartinez This would need to be seriously locked down, as anyone who gets access to the app as it currently stands could do literally whatever they want with your system.

Do you know of any other examples of Flask apps with such precious cargo that run on untrusted networks? I'd be interested to see if the wider community thinks Flask is ready for that. Mitsuhiko himself even says Flask isn't "ready," so I'm wondering if this is a use case it's not ready for.

MichaelMartinez commented 11 years ago

I see your concerns, and I agree they are indeed critical. Short of API changes to flask as a whole, I am not sure what the 1.0 release will accomplish in terms of security. Do you have any insight?

I am using [flask-security]((https://github.com/mattupstate/flask-security) with a little app that hasn't been pen tested nor audited. I haven't deployed it to production, for that matter. Flask-security looks solid, but that doesn't mean anything until its been widely deployed and tested.

The point of adding authentication, in my mind, is simply offloading the work to a proper server. Maybe there is another way?

thieman commented 11 years ago

@MichaelMartinez Can you run me through the intended use case a little more? With the current implementation of Dagobah, the web app actually owns the underlying Dagobah instance that runs the jobs. This means that it's not currently possible to have a master-slave arrangement where a single web app coordinates Dagobah instances on multiple machines, if that's what you're talking about.

Though, now that I've written that sentence, that sounds pretty awesome and is something that we can definitely make work.

MichaelMartinez commented 11 years ago

Here are a few use cases I envision:

  1. User may have some long-living jobs and may need to change location or shut their computer down. Enabling this app to run on a PaaS server or VPS would allow said person to set-up jobs and wait for the emails... no need to babysit their personal computer or worry about connectivity drops, etc.
  2. Team based - remote - collaboration.
  3. Teaching/learning/sharing tool. Ie. "This is my workflow and how I go from Twitter / web scraping to actionable data." Possible to share DAG's or something.

Your last point is interesting as well.

thieman commented 11 years ago

@MichaelMartinez Thoughts:

  1. The web interface is just a display/config engine on top of the underlying Dagobah instance. There's no reliance on the web app maintaining connection with anything for your jobs to run. Dagobah is also currently designed to work on top of a machine that you can freely run commands on top of; it sounds like you're getting at either the master-slave arrangement I spoke of earlier or of having something like an interface between a Dagobah frontend and a service like PiCloud that would allow you to execute jobs somewhere out there in the ether.
  2. This is theoretically possible now, though there is nothing like a user-level permissions layer. One caveat, there are going to be a few race conditions if multiple people try to update the same job at the same time. This could be mitigated if it's something we want to focus on in the future.
  3. Since Dagobah doesn't own any of your underlying code and is just a framework for executing subprocesses, I'm not sure how this would be presented. I'd definitely be open to the idea of having a "Cool Dagobah Workflows" file in the repo or something, though, to show off what people are using it for.
jonathaneunice commented 11 years ago

While not getting into the larger questions of "is Flask secure enough?" or "how can the security, privacy, user identity, user authentication, and user authorization of dagobah be extended to interesting multi-node use cases?" I would second @MichaelMartinez's original point that some sort of authentication is a baseline requirement for any production use.

I can't speak to Flask-Security, though it looks like a nice "batteries included" package. At a minimum, there should be a way to define "authorized users" and a way to limit non-permitted users. As a first step, perhaps authorized user identities would be listed in ~/.dagobahd.yml, then use something like flask_googleauth to determine "is the current user a permitted user?" Here's a toy example of how access can then be restricted:

from flask import Flask, g
from flask_googleauth import GoogleAuth, GoogleFederated
import random, string

def randkey(digits=8, alphabet=string.ascii_letters):
    return ''.join(random.choice(alphabet) for i in range(digits))

app = Flask(__name__)
app.secret_key = randkey(16)
auth = GoogleAuth(app)

@app.route("/")
@auth.required
def secret():
    # Once user is authenticated, his name and email are accessible as
    # g.user.name and g.user.email.
    return "You have rights to be here, %s (%s)" % (g.user.name, g.user.email)

app.run()
thieman commented 11 years ago

I'm going to keep this issue open but separate the proposed changes into two separate issues. I'll close this issue but reference it from the children issues.

I think we should focus first on adding a single-user auth model that would allow the Dagobah client to exist on an untrusted network with some protection from unauthorized access. That's issue #11.

Once that's done, we can focus on the more complex issue of adding multi-user auth, with the additional permissions and core class metadata that that would entail. That's issue #12.