nasonfish / derp

Development Experience Reporting Platform
0 stars 3 forks source link

Database interaction #6

Open nasonfish opened 5 years ago

nasonfish commented 5 years ago

A db_helper class has been created which includes all the tables we work with, and functions which help them interact. These functions all return objects which represent the items in the database.

Here's some examples to illustrate how the current implementation can be used:

# create tables
User.table_init()
Course.table_init()
UserCourse.table_init()

# user functions
u = User.create("nasonfish", "d_barnes", "d_barnes@test.edu")
print(u.email)
print(u.user_pk)
u.email = "daniel@test.edu"
u.save()

u = User.get(github_username="nasonfish") # implicit limit=1, get one user or None
# or
user_list = User.get(limit=None) # all users as a list

# course functions
c = Course.create("CP123")
print(c.course_code)

# user course relationship (many-to-many so we have an intermediary table)
UserCourse.enroll(u, c, "git@github.com:nasonfish/derp.git", "student")

for i in UserCourse.user_courses(u):
  print(i.user.email + "is in " + i.course.course_code)

However, as we expand, it can become messy to create functions for every join, and each database query needs to be able to retrieve a very specialized thing. we're going to outgrow these helper functions very quickly. Notice that we had to implement this join for Course to UserCourse manually, and this isn't always what we want when listing what courses a student is in (it probably is, but there are times when an automatic join is not necessary).

Here's a few options we can pursue which will help us keep our code clean as we expand and add more tables and create new features. If @dellswor is around, an opinion from him would be awesome ;).

Anyway, none of this is urgent and it's just a little food-for-thought while we develop the backend of this portal. I would be fine with implementing any of these solutions, and I know they can all be run efficiently-- it just depends on what we want to get out of this project and what kind of interface we want to be looking at when developing web endpoints.

nasonfish commented 5 years ago

@abebinder @clararichter friends from Database Systems! do you have any opinions on the above options? (do they make sense or have I not described them correctly?)

dellswor commented 5 years ago

Hello all,

I haven’t looked at the code at all… I’m basing all of my comments on the email and some of my design biases.

However you want to handle the design would be great. Of the four options, doing what you are doing and using an existing ORM are my favorite. The system is pretty small so any performance issues introduced by an ORM, unless it is really poorly used, would be negligible.

I would also examine the assumption that classes modeling or similarly structured to the database is desirable… I think any time we try to make a direct map between the object oriented abstraction and relational abstraction we enter into a world of pain and heart break… The abstractions have some fundamental differences that cause problems (OOP isn’t ACID compliant or relational and RDBMs don’t link/store things like in memory object instances).

Another thing to consider is that the webpage isn’t going to be passing objects instead you’ll need to pass some kind of UID to reconstitute object instances from, object instances which aren’t really going to be used for anything other than handles to call functions against. If you don’t keep a bunch of objects in memory between requests, each request will have a DB round trip to instantiate the object to make calls against and a second round trip to actually make the changes.

Perhaps, instead of introducing a constellation of objects that represent tables, an object that “represents” the whole database with functions for each of the things we want the database to do would be less cumbersome and keep the query logic contained. Changes to the database would result in changes to only the file defining the database object rather than needing to sift through the code base for all of the different object that run database queries. On the performance side, we wouldn’t need to select data from the database to instantiate a user/course instance before executing the query to make changes if the page contained a useful UID.

Rather than asking the user object to make itself as in the example, you could ask the database object to make the user. user = derpDB.create_user(requestor, “nasonfish”,”d_barnes”,”d_barnes@test.edu mailto:d_barnes@test.edu”) This would atomically create the record and return a user instance or an exception if the transaction could not complete. The design also lets some of the security be consolidated, since the function could take in the requesting user as an argument and check the permissions (we just hate when an unauthorized user figures out how to get to the wrong page). Using stored procedures and removing select/insert/update/delete, we could even move the security check into the database (making it much harder for programming errors in the application to be effectively exploited). If we were going to add auditing to the system (to keep track of how the database was changed) these functions would also make an central place to add these hooks.

Getting a specific user feels like a fundamentally different thing from getting a list of users… I would not overload the get function; I would make these separate function calls.

Thanks, -Dan

On Nov 18, 2018, at 6:25 PM, Daniel Barnes notifications@github.com wrote:

@abebinder https://github.com/abebinder @clararichter https://github.com/clararichter friends from Database Systems! do you have any opinions on the above options? (do they make sense or have I not described them correctly?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nasonfish/derp/issues/6#issuecomment-439745547, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIDvkLHZhsXPJtbuc3U4moQ9v8JKsj0ks5uwgifgaJpZM4YoQPl.

nasonfish commented 5 years ago

Wow, thanks so much for the reply! I really appreciate the input.

For the time being, I think the approach I'm going to go with is continuing with what we have right now. If it becomes too cumbersome and terrible, I might come back and re-evaluate, but I think the current system allows some freedoms and allows for code clarity that we might otherwise lose with an ORM.

I initially liked the use of static methods for things like user creation (User.create(...) returns user instance) but I think the fallacy I'm getting caught in is that not every object is going to look identical, and that we only really need to represent a finite set of functions (i.e. not every class will have that .create(...) function, so it's not like we're imitating an interface by doing this). Now I realize separating those "utility" functions as static methods raises some questions about what belongs where, and forces us to make some arbitrary choices for what to do when different tables interact.

However, I definitely would rather keep the permission logic away from the database interface-- I think Class.create(...) should not pass the issuer, and that logic should be handled in the endpoint itself, sort of like this:

@course.route('/<id>')
@login_required  # NOTE that this decorator ensures the user is logged in
def view(id):
    user = get_session_user()
    course = Course.get(id)
    if not course:  # this course does not exist
        return abort(404)
    user_course = course.get_enrollment(user)
    if not user_course:  # user is not enrolled in course
        return abort(403)
    return render_template("view.html", user_course=user_course)

I'll do some work on the db_helper and check back in soon. Thanks so much for weighing in, @dellswor !

dellswor commented 5 years ago

Keeping the permission logic away from the DB is a fine decision for the scope of this system but a different architectural impulse than I have.

From the sample view logic:

An alternative query could hit the DB once by using a query that checks the user enrollment as part of the course request. As long as the indexes aren’t screwed up, this query should be fast.

Is there functionality that might be accessible on multiple pages? If so, placing the access control at the outermost perimeter places some extra burden on the implementors (every page we can access/change particular data on must perform the same checks in the same right way). If we change the access rules related to some action, we will need to find all the perimeter locations and make changes to ensure the security policy is maintained. If there is only one page for every thing we might do, then there is less risk we will mess up one of the checks on one of the pages. Reusing pages for users with different permissions might also have some complexity to ensure the right constraints are applied in all cases.

As an attacker, I would be looking for execution paths and timing attacks that would allow checks to be skipped. The more the developer has to code and more places that code needs to exist, the greater the chance that there will be a place where the logic is wrong/inconsistent.

On Nov 18, 2018, at 8:44 PM, Daniel Barnes notifications@github.com wrote:

Wow, thanks so much for the reply! I really appreciate the input.

For the time being, I think the approach I'm going to go with is continuing with what we have right now. If it becomes too cumbersome and terrible, I might come back and re-evaluate, but I think the current system allows some freedoms and allows for code clarity that we might otherwise lose with an ORM.

I initially liked the use of static methods for things like user creation (User.create(...) returns user instance) but I think the fallacy I'm getting caught in is that not every object is going to look identical, and that we only really need to represent a finite set of functions (i.e. not every class will have that .create(...) function, so it's not like we're imitating an interface by doing this). Now I realize separating those "utility" functions as static methods raises some questions about what belongs where, and forces us to make some arbitrary choices for what to do when different tables interact.

However, I definitely would rather keep the permission logic away from the database interface-- I think Class.create(...) should not pass the issuer, and that logic should be handled in the endpoint itself, sort of like this:

@course.route('/') @login_required # NOTE that this decorator ensures the user is logged in def view(id): user = get_session_user() course = Course.get(id) if not course: # this course does not exist return abort(404) user_course = course.get_enrollment(user) if not user_course: # user is not enrolled in course return abort(403) return render_template("view.html", user_course=user_course) I'll do some work on the db_helper and check back in soon. Thanks so much for weighing in, @delswor!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nasonfish/derp/issues/6#issuecomment-439763078, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIDvo6scEX5QQ9sP4ojJhXub68-8rhLks5uwikRgaJpZM4YoQPl.

nasonfish commented 5 years ago

Here's a snippet of code that I might try to shoot to design around. I think it incorporates the database access-control logic a little better than my previous example (with less DB queries for the same action), uses a better convention for where the database functions are held, and still makes sense in keeping the web logic in the endpoint.

@course.route('/<id>')
@login_required  # unprivileged access -- only prerequisite is that the resource they want actually exists
def view(id):
    user = get_session_user()
    user_course = DerpDB.user_course(user, id)
    if not user_course:
        abort(404)
    return render_template("course/view.html", user_course=user_course)

# and perhaps a POST request
@course.route('/new', methods=['GET', 'POST'])
@permission_required('class:create')  # something that must be granted to the user on purpose. abort(403) if unauthorized.
def new():
    user = get_session_user()
    if request.method == 'POST':
        course = DerpDB.new_course(...)  # or fail/flash if the request was bad
        return redirect(url_for('.view', id=course.course_pk))
    return render_template('course/create.html')

This introduces a new system for permissions, where some users might be allowed to create courses (professors), some might be able to manage/view all classes (paraprofs), some might be able to view single classes (graders), and unprivileged users can only access the pages without these restrictions.

I hope this fits a little more in-line with your instinct-- I like the way the access control is really just failing to access a resource in a database, and that we still deal with the HTTP response codes/whatever web-talk we need to do in the endpoint.

I'm sure this will need to be expanded once I encounter cases less trivial than creating a class and viewing a class you're enrolled in, but here's at least a snippet of code so you know what I'm aiming for/what I consider "elegant" (since perhaps my definition of that word needs to be revised!)

[Also, by the way, I really appreciate your comments this weekend, but I just want to let you know that you are, by no means, obligated to help me out! I don't want you to feel like I've forced you into advising another project, but any help you offer is extremely appreciated!]

dellswor commented 5 years ago

I’m interested in how the capability matrix is expressed for the annotation but, on a quick first pass, I think this design looks nice. Using the annotations to add/handle the check in the code seems like a nice way to reduce the potential for errors. Separating the checks for presentation from the checks for backend data manipulation also seems good (I assume that is going on too based on how the new_course call is commented).

Software design and architecture are a lot of fun to think/argue about. I’m happy for the chance to talk about the design with you and appreciate your interest in one of my side projects. Thank you for soliciting for feedback from me.

On Nov 19, 2018, at 12:22 PM, Daniel Barnes notifications@github.com wrote:

Here's a snippet of code that I might try to shoot to design around. I think it incorporates the database access-control logic a little better than my previous example (with less DB queries for the same action), uses a better convention for where the database functions are held, and still makes sense in keeping the web logic in the endpoint.

@course.route('/') @login_required # unprivileged access -- only prerequisite is that the resource they want actually exists def view(id): user = get_session_user() user_course = DerpDB.user_course(user, id) if not user_course: abort(404) return render_template("course/view.html", user_course=user_course)

and perhaps a POST request

@course.route('/new', methods=['GET', 'POST']) @permission_required('class:create') # something that must be granted to the user on purpose. abort(403) if unauthorized. def new(): user = get_session_user() if request.method == 'POST': course = DerpDB.new_course(...) # or fail/flash if the request was bad return redirect(url_for('.view', id=course.course_pk)) return render_template('course/create.html') This introduces a new system for permissions, where some users might be allowed to create courses (professors), some might be able to manage/view all classes (paraprofs), some might be able to view single classes (graders), and unprivileged users can only access the pages without these restrictions.

I hope this fits a little more in-line with your instinct-- I like the way the access control is really just failing to access a resource in a database, and that we still deal with the HTTP response codes/whatever web-talk we need to do in the endpoint.

I'm sure this will need to be expanded once I encounter cases less trivial than creating a class and viewing a class you're enrolled in, but here's at least a snippet of code so you know what I'm aiming for/what I consider "elegant" (since perhaps my definition of that word needs to be revised!)

[Also, by the way, I really appreciate your comments this weekend, but I just want to let you know that you are, by no means, obligated to help me out! I don't want you to feel like I've forced you into advising another project, but any help you offer is extremely appreciated!]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nasonfish/derp/issues/6#issuecomment-440011319, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIDvvmgvD_r_JG2Ao4XnmOcvH_chP3Mks5uwwUPgaJpZM4YoQPl.

nasonfish commented 5 years ago

Hi there,

Here's an update with a few code blocks, to demonstrate how the code is shaping up. I just wanted to post a few examples to show the culmination of the feedback you provided, in case you're interested in taking a look :). No action needed here-- just an update!

derp/course/views.py


# GET /course/
@course.route('/')
@login_required
def index():
    return render_template("course/list.html")  # user is implicitly passed to the template!

derp/course/templates/course/list.html


{% extends "base/scaffolding.html" %}
{% block title %}List Courses{% endblock %}
{% block content %}
<ul>
    {% for i in user.courses() %}. <!-- we can use the user here because it's passed by a preprocessor :) -->
        <li>Course : {{ i.course.course_pk }} {{ i.course.code }} {{ i.repo }} {{ i.role }}</li>
    {% endfor %}
</ul>
{% endblock %}

derp/course/views.py


@course.route('/create', methods=['GET', 'POST'])
@permission_required('course:create')
def create():
    if request.method == 'POST':
        block = request.form['block']
        if len(block) > 1:
            flash("Block must be a one-character identifier, such as 'A' or '1'", 'danger')
            return render_template('course/create.html')
        year = request.form['year']
        try:
            int(year)
        except ValueError:
            flash("Year must be a number.", 'danger')
            return render_template('course/create.html')
        code = request.form['code']
        course = Course.create(code, block, year)  # EDIT: a TODO is to rename this to DerpDB.course_create()
        UserCourse.enroll(get_session_user(), course, None, 'professor')  # and this to DerpDB.user_enroll_course() or something
        flash('Course successfully created', 'success')
        return redirect(url_for('.view', id=course.course_pk))
    return render_template('course/create.html')

derp/account.py

# wrappers which make life really easy
def login_required(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        if not get_session_user():  # TODO -- consider passing the user into f()
            return redirect(url_for("login"))
        return f(*args, **kwargs)
    return decorated_function

def permission_required(permission_name):
    def wrapper(f):
        @wraps(f)
        def decorated_function(*args, **kwargs):
            if not DerpDB.get_user_permission(permission_name):  # get user from session and check against permission
                return abort(403)
            return f(*args, **kwargs)
        return decorated_function
    return wrapper
nasonfish commented 5 years ago

Okay, so I got ~lazy~ smart and decided to use an ORM, SQLAlchemy. Spending the time to write every single function to manipulate the data in one tiny way is just not worth the small performance improvement, and every step I take toward merging the one-purpose static methods in DerpDB into many-purpose functions (like my query) is basically just reinventing the wheel and taking one step closer to being an ORM.

If I'm going to spend the time to try to make adding new queries not painfully slow, I may as well go all the way and use an ORM. It's more likely that using existing tools like these will be useful in the future for the other folks on the team anyway, so I'm happy with my decision for the time being.

I know both solutions aren't perfect but SQL queries run quickly and it's unlikely that we'll pull TOO much more data than absolutely necessary.

See the above commit for the changes (9d221de)-- but I think it's a good sign that this large change cut down the code by about 300 lines, though that might say more about me than the ORM.