Automatically collect assignments at due date

rbeezer commented 9 years ago

Possibly a duplicate. It would be nice to set a due date, at say midnight, and have SMC collect all the assignments. Perhaps a checkbox near the due date "Automatically collect assignments when due".

I'd guess some sort of cron-job type functionality would be needed, but since a project is not always running, it would need to be implmented/available system-wide.

haraldschilly commented 8 years ago

+1 from a support request.

My first rough idea how to implement this:

think about "scheduled events" in a broader sense, and this is a first special application of this.
make table in the DB, basically consisting of project_id, time, status, and an additional field consisting of a dictionary describing what to do { action: "collect_assignment", path: " ..." } (for now, we only have this action)
the next step is to periodically pull this db, filter those which are past the time and the status is still "pending". the question is, where should this run? hubs sound like the place to be, but all of them at once?
the hubs then dispatch to the assigned compute server, or open the project on a new one. then, the project is on one of the compute nodes and ready to execute.
if the compute server receives such a command, the status entry should be set to active. additional requests to run the command on a give compute node should be ignored (basically).
dispatch the collect_assignment action, with the additional data, to the project.
if it reports back that everything is fine, set the status to "done"

williamstein commented 8 years ago

hubs sound like the place to be, but all of them at once?

We can use a locking process -- when a hub decides to do some work, it first has to set something in the database, wait, then check to see if it got the lock. Since rethinkdb is 100% consistent (not eventually consistent like cassandra), this works. Basically, it allows the hubs to leverage the RAFT distributed consensus stuff that rethinkdb has developed.

Alternatively, we can have another service that handles this. In practice, it'll be some node.js code, and it can be run in either the hub process or somewhere else. Maybe one place for a while, and another later.

haraldschilly commented 8 years ago

For scheduling tasks, there is: https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki/Delayed-Job

haraldschilly commented 7 years ago

since it came up again, the first lib that I found for node+pg is https://github.com/noblesamurai/node-pg-jobs

haraldschilly commented 6 years ago

Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot. There is no need to copy the snapshot immediately, what might make this a bit easier to deal with. For those student projects that aren't running, pick the latest snapshot. However, I think we would have to come up with special snapshot names that aren't getting cleaned up and have a predictable name.

williamstein commented 6 years ago

Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot.

It's an interesting idea, since it genuinely doesn't require scheduled tasks for collection to get implemented, which is nice since the security and other implications of that are worrisome.

williamstein commented 6 years ago

Also, building something like this on the new zpool pod thing I recently wrote might be pretty easy. Type c zpool -h to see what commands exist already. It would be very easy to add "snapshot" as another one. Adding to flex/zfs.py special snapshots (named in such a way they don't get deleted for at least n months (?)) would be harder, but still very natural.

williamstein commented 6 years ago

Another nice thing is that when we have the .student file (analogue to .course for students) it can point to the snapshot where collection happened, so students can see.

matthew-brett commented 5 years ago

I'm afraid I don't understand the technical details, but I think you're saying that there might be some way of triggering a snapshot at a certain time, which would work fine for my needs - recording state as of a deadline.

haraldschilly commented 5 years ago

status: there is an API extension to schedule copy operations. this means we have the technical ground work to do this.

what's missing is the interface.

there must be a way to set the scheduled copy operation for all projects/students. similar to assigning right now (iterating over all of them). it will run much quicker.
we need to see if a collection is scheduled: show the time (absolute?) for each student.
when the operation is done, either show it is completed or the associated error. let the user retry immediately manually.
cancelling a scheduled operation means to issue cancellation commands for each copy operation.

williamstein commented 4 years ago

NOTE: when we do implement this, we will have to be clear about the time zone when the instructor selects the time at which collection happens.

DrXyzzy commented 4 years ago

Requested by Kristin McCully.

williamstein commented 4 years ago

REQUESTED AGAIN by: xinyue liu (from UCLA)

williamstein commented 1 year ago

Difficult and still not implemented today.

williamstein commented 1 month ago

See https://github.com/sagemathinc/cocalc/issues/3286#issuecomment-2364759535

sagemathinc / cocalc

Automatically collect assignments at due date #109