sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.17k stars 216 forks source link

Automatically collect assignments at due date #109

Open rbeezer opened 9 years ago

rbeezer commented 9 years ago

Possibly a duplicate. It would be nice to set a due date, at say midnight, and have SMC collect all the assignments. Perhaps a checkbox near the due date "Automatically collect assignments when due".

I'd guess some sort of cron-job type functionality would be needed, but since a project is not always running, it would need to be implmented/available system-wide.

haraldschilly commented 8 years ago

+1 from a support request.

My first rough idea how to implement this:

  1. think about "scheduled events" in a broader sense, and this is a first special application of this.
  2. make table in the DB, basically consisting of project_id, time, status, and an additional field consisting of a dictionary describing what to do { action: "collect_assignment", path: " ..." } (for now, we only have this action)
  3. the next step is to periodically pull this db, filter those which are past the time and the status is still "pending". the question is, where should this run? hubs sound like the place to be, but all of them at once?
  4. the hubs then dispatch to the assigned compute server, or open the project on a new one. then, the project is on one of the compute nodes and ready to execute.
  5. if the compute server receives such a command, the status entry should be set to active. additional requests to run the command on a give compute node should be ignored (basically).
  6. dispatch the collect_assignment action, with the additional data, to the project.
  7. if it reports back that everything is fine, set the status to "done"
williamstein commented 8 years ago

hubs sound like the place to be, but all of them at once?

We can use a locking process -- when a hub decides to do some work, it first has to set something in the database, wait, then check to see if it got the lock. Since rethinkdb is 100% consistent (not eventually consistent like cassandra), this works. Basically, it allows the hubs to leverage the RAFT distributed consensus stuff that rethinkdb has developed.

Alternatively, we can have another service that handles this. In practice, it'll be some node.js code, and it can be run in either the hub process or somewhere else. Maybe one place for a while, and another later.

haraldschilly commented 8 years ago

For scheduling tasks, there is: https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki/Delayed-Job

haraldschilly commented 7 years ago

since it came up again, the first lib that I found for node+pg is https://github.com/noblesamurai/node-pg-jobs

haraldschilly commented 6 years ago

Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot. There is no need to copy the snapshot immediately, what might make this a bit easier to deal with. For those student projects that aren't running, pick the latest snapshot. However, I think we would have to come up with special snapshot names that aren't getting cleaned up and have a predictable name.

williamstein commented 6 years ago

Another idea: at the collection date, save the files, create a snapshot, and then collect the snapshot.

It's an interesting idea, since it genuinely doesn't require scheduled tasks for collection to get implemented, which is nice since the security and other implications of that are worrisome.

williamstein commented 6 years ago

Also, building something like this on the new zpool pod thing I recently wrote might be pretty easy. Type c zpool -h to see what commands exist already. It would be very easy to add "snapshot" as another one. Adding to flex/zfs.py special snapshots (named in such a way they don't get deleted for at least n months (?)) would be harder, but still very natural.

williamstein commented 6 years ago

Another nice thing is that when we have the .student file (analogue to .course for students) it can point to the snapshot where collection happened, so students can see.

matthew-brett commented 5 years ago

I'm afraid I don't understand the technical details, but I think you're saying that there might be some way of triggering a snapshot at a certain time, which would work fine for my needs - recording state as of a deadline.

haraldschilly commented 5 years ago

status: there is an API extension to schedule copy operations. this means we have the technical ground work to do this.

what's missing is the interface.

williamstein commented 4 years ago

NOTE: when we do implement this, we will have to be clear about the time zone when the instructor selects the time at which collection happens.

DrXyzzy commented 4 years ago

Requested by Kristin McCully.

williamstein commented 4 years ago

REQUESTED AGAIN by: xinyue liu (from UCLA)

williamstein commented 1 year ago

Difficult and still not implemented today.

williamstein commented 1 month ago

See https://github.com/sagemathinc/cocalc/issues/3286#issuecomment-2364759535