Open rbeezer opened 9 years ago
+1 from a support request.
My first rough idea how to implement this:
project_id
, time
, status
, and an additional field consisting of a dictionary describing what to do { action: "collect_assignment", path: " ..." }
(for now, we only have this action)time
and the status is still "pending". the question is, where should this run? hubs sound like the place to be, but all of them at once?active
. additional requests to run the command on a give compute node should be ignored (basically).collect_assignment
action, with the additional data, to the project.status
to "done"hubs sound like the place to be, but all of them at once?
We can use a locking process -- when a hub decides to do some work, it first has to set something in the database, wait, then check to see if it got the lock. Since rethinkdb is 100% consistent (not eventually consistent like cassandra), this works. Basically, it allows the hubs to leverage the RAFT distributed consensus stuff that rethinkdb has developed.
Alternatively, we can have another service that handles this. In practice, it'll be some node.js code, and it can be run in either the hub process or somewhere else. Maybe one place for a while, and another later.
For scheduling tasks, there is: https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki/Delayed-Job
since it came up again, the first lib that I found for node+pg is https://github.com/noblesamurai/node-pg-jobs
Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot. There is no need to copy the snapshot immediately, what might make this a bit easier to deal with. For those student projects that aren't running, pick the latest snapshot. However, I think we would have to come up with special snapshot names that aren't getting cleaned up and have a predictable name.
Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot.
It's an interesting idea, since it genuinely doesn't require scheduled tasks for collection to get implemented, which is nice since the security and other implications of that are worrisome.
Also, building something like this on the new zpool pod thing I recently wrote might be pretty easy. Type c zpool -h
to see what commands exist already. It would be very easy to add "snapshot" as another one. Adding to flex/zfs.py special snapshots (named in such a way they don't get deleted for at least n months (?)) would be harder, but still very natural.
Another nice thing is that when we have the .student
file (analogue to .course for students) it can point to the snapshot where collection happened, so students can see.
I'm afraid I don't understand the technical details, but I think you're saying that there might be some way of triggering a snapshot at a certain time, which would work fine for my needs - recording state as of a deadline.
status: there is an API extension to schedule copy operations. this means we have the technical ground work to do this.
what's missing is the interface.
NOTE: when we do implement this, we will have to be clear about the time zone when the instructor selects the time at which collection happens.
Requested by Kristin McCully.
REQUESTED AGAIN by: xinyue liu (from UCLA)
Difficult and still not implemented today.
Possibly a duplicate. It would be nice to set a due date, at say midnight, and have SMC collect all the assignments. Perhaps a checkbox near the due date "Automatically collect assignments when due".
I'd guess some sort of cron-job type functionality would be needed, but since a project is not always running, it would need to be implmented/available system-wide.