hltcoe / turkle

Django-based clone of Amazon's Mechanical Turk service running in your local environment.
https://turkle.readthedocs.io
Other
147 stars 46 forks source link

Add resumable projects #58

Open hltcoe-bot opened 4 years ago

hltcoe-bot commented 4 years ago

Support users starting a task, saving intermediate results, and then coming back to it later.

  1. The admin UI needs a new checkbox for resumable
  2. This option conflicts with anonymous annotators so need client side/server side checks on that
  3. Need to think about how task expiration fits with this. Maybe no expiration for resumable tasks or expiration does not affect tasks that have intermediate results
  4. Exporting a batch should not include intermediate results
  5. Requestors changing the template could break loading intermediate results. No way around this. Just need to document this and maybe provide a warning when updating a project that has any intermediate results.
  6. Loading intermediate results is not trivial. Template builders may need to build special code to populate the form with the intermediate results. Need to consider that we have html only templates that only use input controls and richer templates that have plenty of javascript or store results as data attributes on the DOM or in memory.

Also, this would be the first feature that makes our templates incompatible with MTurk. I believe the goal would be to continue to support MTurk templates but offer a superset of features. So we won't require anything in a template that would makes use incompatible but we will add optional features to better support our use cases.

Poster: Cash Costello id: 171

hltcoe-bot commented 4 years ago

vandurme ccostello charman

This seems like the most relevant existing ticket, so moving discussion here. Cash's list is a superset of what I mentioned on github. For completeness, here is the list of issues from his recent email:

  1. Anonymous HITs cannot support this so there will need to be logic for that.
  2. HIT results now need a field on whether it is complete or not. This field needs to be used in the admin backend whenever we show progress on a batch or for downloading results.
  3. Between assigned but not started HITs and now incomplete HITs, we will need to change how we notify the user of these. We're currently using notifications on the top of the main page - 1 per HIT. I think we'll need to change that to maybe a central page that lists all the user's unstarted and incomplete HITs with a reminder message.
  4. What does it mean for a user to return a HIT that has been partially completed? I guess we delete the data.
  5. How do we handle expiring an assignment? Maybe we don't expire incomplete HITs or have a different expiration timer on them?
  6. This will likely require an additional button beyond just save. Decisions that we make here could result in complete incompatibility between MTurk and Turkle templates and I don't think we want that.
  7. And finally the hardest problem is that most of our templates rely heavily on JavaScript for rendering and submission. I recommend starting with non-JavaScript templates first and get everything working there. We can add an option at the Project level to turn on this draft capability. To support Javascript templates, we'll need to add additional APIs for these templates to query for incomplete results.

My goal, indeed, is to make this first work for non-javascript-reliant templates, without disrupting any existing functionality and entirely opt-in on the project+worker level.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

lippincott I have thought more about this and I think it is best to do the form filling from saved data using JavaScript. The advantage over doing this with Python is that there would be a DOM to work with. We will have to get all the inputs and match their names to the names of the variables. It's not as simple as setting values because of inputs like checkboxes so some input types will need specific logic.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

vandurme ccostello charman

I have a branch that implements persistent tasks without disrupting existing functionality, but it's fairly limited/dangerous (only handles text and checkbox inputs at the moment, and is completely vulnerable to injection attacks) and made some very inelegant choices to display the functionality quickly (e.g. just adding another list, "persistent tasks", to the landing page). On the other hand, it's almost 100% additional code, rather than modifications, so I'll try to commit it carefully and piecemeal to the branch so parts can be taken/left as needed. It works for my purposes, but I'm interested in rounding it out (handling all input types, securing the form-restoration, etc), so maybe we should have a call later in the summer after folks take a look. I'll also add a working example.

Cash and Craig, I'm having a bit of an issue with server response-time: there's some really suboptimal code in what I wrote, but it's pretty snappy on my laptop, while on a pretty beefy AWS instance many clicks will hang for tens of seconds. This happens with both MySQL and Sqlite3 backends, using IP and FQDN (so not a DNS thing), and CPU/memory/disk use are all very low. And the AWS instance has ~100% of its bandwidth/disk "credits" sitting unused, which is the only AWS-specific recommendation I found to look at. Have you guys ever experienced this sort of issue on EC2? Just a Hail Mary...

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

lippincott I'm quite busy today, but will try to take a look at this. When you have a branch that I can look at, point me to it. If you have an EC2 instance that I can check out, I can do some profiling to figure out where the issue lies.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello thanks! do you have an ssh public key I could put on the server so you can log in?

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

sent an email with key

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello Thanks: you should be able to ssh into ec2-user3.87.135.105 now.

The "turkle" directory has a checkout of branch "issue-171-persistent-HITs", and a venv to source. It also has two non-repo files, "turkle/management/commands/load_chaucer.py" and "rebuild.sh". Invoking the latter will tear down the existing setup, rebuild and redeploy the images, and load the tasks, and is self-explanatory. I went the Docker route because I thought the slowdown might be sqlite-related, but I guess not.

You can log into xxxxx using user/pw xxxxx/xxxxx. The slowdowns seem nondeterministic, but if you click down into the first task and start moving around with e.g. shift-left/right, you should see that it sometimes just hangs for many seconds before proceeding.

I assumed there would be a performance issue with how I'm calculating "partial-completeness" of projects/batches (maybe I can do the non-invasive thing and create new "Task/Batch/ProjectCompletionState" classes), but I don't think that's what's to blame here, for a variety of reasons (in particular the nondeterminism and the fact this is a toy data set).

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

I can ssh to the server, but I cannot hit the website on 8080. Do I need to be on JHU vpn to access that port?

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello hmmm, no, it's open (I'm not on VPN, and just checked the security group and it has the same spec as SSH)

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

It must have been something with APL's firewall as I'm able to access to the web site now. I'm getting a 500 error after logging in. I'm checking on that now.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello that's odd: I wasn't seeing crashes, just the slowdown.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

Were you checking your browser's developer tools - specifically the Network tab?

Poster: Cash Costello

hltcoe-bot commented 4 years ago

For the currently running instance, I see that it started at 2020-05-26 11:28:07 local time and that the first crash happened at 2020-05-26 11:30:10. If it was happening on an endpoint being hit by ajax, you wouldn't see it unless you checked the Network tab or maybe the console.

It is also possible that this is a completely separate issue.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

I'm going to restart the docker containers and test from a clean install.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

Would you like me to do anything code-wise, or just leave you to it for a few hours? It seems like maybe it would be easier to hang tight until you've poked around and then work on it afterwards.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

mentioned in issue #262

Poster: Cash Costello

hltcoe-bot commented 4 years ago

Give me an hour.

I also noticed a race condition with mysql starting up:

MySQLdb._exceptions.OperationalError: (2003, "Can't connect to MySQL server on 'db' (111)")

At some point I thought I did something so that the application would wait for the database to start up, but maybe that was on a different project...

Poster: Cash Costello

hltcoe-bot commented 4 years ago

Ah, that's new, but makes sense. I had to throw a random sleep command into the bash script to wait for the management commands to finish: last I checked, the Docker folks didn't want responsibility for mechanisms to monitor internal container state. Seemed like a fair line to draw, not sure where it stands these days.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

I think the issue is with gunicorn. I've been able to reproduce on your EC2 instance with a clean turkle install. Something is stalling for 30 seconds and then either a connection is closed or a worker is killed and restarted. I have a meeting at 1 pm. I'll do some testing after that.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello thanks, I really appreciate it!

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

We normally run gunicorn behind a reverse proxy like nginx or apache. Looks like the issue is that web browsers will sometimes open additional connections to your server. Details here: https://hackernoon.com/chrome-preconnect-breaks-singly-threaded-servers-95944be16400

Because we use a reverse proxy, we don't see this issue. I also use more than one worker usually which would help with this but does not solve the issue (if for example, you had multiple tabs open).

I tested a little bit with a changed gunicorn config and it seems to be working with a single browser. You can do a git diff to see my changes. Any production use of this should have a reverse proxy in front of it.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

Thanks ccostello , this is so helpful, I really appreciate it: giving a demo tomorrow. I'll ping you and charman when the branch is handling arbitrary forms in a safe way.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

ccostello charman vandurme (also cmay as they have mentioned this in other threads and may have other insights)

I think this is pretty close to ready, and I could have it cleaned up and merge-requested in a few hours' effort. Here are responses to Cash's bullet points in the issue:

  1. Add "is_resumable" flag to Project, propagate to Batch/Task/TaskAssignment as necessary, and mark resumable batches as such (highlight according to amount-completed?) in the UI. There's a question of whether to use permissions to say "user A can resume project B", which I'm doing now, or just say "project B is resumable". I lean towards the latter now, because I don't see a lot of situations where a project should be resumable for some and not others (in which case it's easier to just create another project).
  2. Set "login_required" to True for all batches derived from a resumable project.
  3. Set "allotted_assignment_time" to null for all batches derived from a resumable project, and propagate this to TaskAssignments.
  4. What's the reasoning for not exporting partial work?
  5. We could start out with instructions that explicitly limit the features of resumable projects (e.g. "fields must be of type a/b/c, field names must be valid Python identifier names, etc) and loosen those limits if/when the functionality gets more robust.
  6. Closely related to 5, I think we'd cover 80% of use-cases with the basic mechanisms in the branch now, and by having the restoration occur as the very last step it can actually handle some of the javascript-generated behavior.
  7. (This is my own proposal): change the "completed" flag on TaskAssignment from a boolean to a real. Existing sets/checks on non-resumable completeness can just use 0/1, and resumable tasks can maintain info for e.g. the marking mentioned in 1).

Comments? Questions? Criticism? I would probably also make some minor template/view changes to consolidate the parallel code for the resumable case, and add a javascript file with the utilities for field-restoration.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

ccostello charman cmay vandurme

If I don't here otherwise by end-of-day tomorrow (Friday), I'll start on these changes with the understanding that they are more-or-less acceptable and the effort won't be wasted (they'll still be subject to a merge request, just don't want to find out at that point that there are strong objections!)

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

You have my attention! Sorry for the delay.

To address your question on partial work, I don't think it should be exported because

  1. It is not complete and how that affects some down stream aggregation script is highly dependent on the task.
  2. Partial does not mean that the part that is done is correct. An annotator could leave something as partial because they need to check something or they're not happy with it.

As a framework builder it is the better option to not export partial results as that is more predictable.

I promise to give you more detailed feedback by noon tomorrow.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello Thanks! And no worries, I appreciate all of this. The partial task thing makes sense. I'm guessing perhaps the best place to hook in for things like regular backups and whatnot would be to add admin commands? I've been meaning to look into whether that can be done non-invasively, e.g. a project-specific repo define its own task-loader logic (that's what I've been doing for the Chaucer study, though I added the command directly to my turkle branch). I'm a bit paranoid about backups here, since the humanities folks put a ton of subtle effort into some of their "tasks", and their good will is pretty much my only academic currency right now!

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

Good point. There is a difference between the requester downloading all the data to process and the admin backing up everything. Right now we expect the requesters will use the web UI to download their data. There are also some scripts for this that we eventually want to move to a full blown API. We expect the admin to set up regular backup cron jobs that dump the entire database.

Our only documentation on the database backups is here: https://github.com/hltcoe/turkle/blob/master/docs/ADMINISTRATION.rst#database-backups If you think we need to provide more assistance in the documentation on backups and restoring from backups, open a issue so that we can capture that. Right now we're assuming the admin is comfortable with this, but maybe we're assuming too much.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

lippincott thank you for this, I think it will be very useful. My only feedback is about exports. For me, I would rather not get partial results in the export, but if I did, an explicit CSV field indicating whether each record was partial would alleviate that concern to a large degree. But I don't know if my approach/preferences are representative.

Poster: Chandler May

hltcoe-bot commented 4 years ago

On task assignment expiration, I would hate to lose that just because I chose to turn on the resumable option.

Why did we add expiration: because annotators will accept a task and never complete it. For annotators this most likely happens due to having the auto assignment option set and then they leave the project or take time off. I'm assuming most projects keep the default of 24 hours.

I cannot imagine that the problem of abandoned tasks goes away with resumable projects.

What are our options:

  1. Turn off expiration for resumable projects
  2. Only expire task assignments in the new state (meaning do not expire partial assignments)
  3. Have a separate expiration process for partial assignments
  4. Treat partial assignments just like new assignments
  5. Option 2 but provide a manual override for requesters to expire these

I would start with option 2 and perhaps in the future add something like 3 or 5.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

lippincott charman I'm now thinking about the UI for this. First, how does the annotator submit a partial annotation?

Do we want separate buttons for submit (as finished) and save (as partial)?

I think we have to because any code we write to automatically detect the annotator's intention will be fail sometime.

If we require two buttons, should we do away with automatically detecting whether the template has a submit button? Instead, always require that the template has its own submit button. Our templates would still work with MTurk but MTurk templates that do not have their own submit button would not work on Turkle.

The reason I'm suggesting this is that it will start to get messy to have code that detects the presence of a submit button and a save button and properly create and position one or the other.

Poster: Cash Costello

hltcoe-bot commented 4 years ago

ccostello charman vandurme cmay So, in the somewhat ad-hoc templates for persistent tasks I threw together, rather than any submit button, I just javascripted in shift+arrow key navigation forward/backward/up through the task assignments, saving state whenever leaving a page. I also made the browsing a bit hierarchical, so that persistent batches on the front page are colored by how complete their tasks are, and clicking on them led to a list of their tasks colored according to how complete they are. Ideally I would have had this nested deeper, because this was a book, with chapters, with 10-line chunks as the tasks, so with the current setup the front page had a batch for each chapter: messy, and would get a lot more so if there were e.g. multiple books.

That's all outside of what Amazon provides of course, so I'm not saying it needs to be addressed, but I do think that's a very typical scenario for persistent annotation: a tree, where the root/batch is something pretty damn big, like a book, leaves are tasks, and there's intermediary structure that would be helpful to have directly navigatable. It probably could be done non-invasively w.r.t. existing functionality by adding a model class just for internal node structure of persistent batches, and views that are only used for persistent batches with said structure.

Poster: Thomas Lippincott

hltcoe-bot commented 4 years ago

lippincott I had played around a little with your UI when I was figuring out the performance issue.

I had considered a resumable task something that you can save with a partial result and then come back to later to finish. You can come back to that task as many times as you want to in order to update it and save it as partial, but once it is submitted, you cannot go back to edit it. Does your concept have a final submit on that task?

A second way your UI seemed different is that really the book or the chapter was a single task and each chunk was a sub-task. It makes sense for an annotator to move back and forth among the sub-tasks and for a single annotator to be assigned to the task.

Do agree with the distinctions that I am making here?

It would also help me to explain a little more what you mean by persistent tasks (or did I capture it with the idea of one large tasks with many sub-tasks that the annotator comes back to over a period of days/weeks).

Poster: Cash Costello

hltcoe-bot commented 4 years ago

mentioned in issue #273

Poster: Craig Harman