issue with markings - Githubissues

ggdhines-zz commented 9 years ago

Suppose we are running Penguin Watch on Panoptes. There are two different ways we go do the markings for chicks vs. adults (plus all the other options which I'll ignore for now). One is to offer just one marking tool which pops up with a question everytime asking what kind of thing has been marked (adult or chick). The other option is to offer separate chick and adult marking tools. (The second way is from a HCI point of view definitely better and basically how Penguin Watch currently runs.) Suppose we go with the second way. I run my clustering algorithm and come up with clusters, each representing a separate and unique penguin. to do so, I need to first ignore what type each marking is. So I am going to make a decision - if a single marking task contains more than one tool of the same type (e.g. line, circle, square etc.), I will assume that when clustering, I can combine markings from both tools. so if people want to have two marking tools - each marking a point - but they don't want those points to be possibly confused with each other, they will need to use two separate tasks. In addition, if a task has two point marking tools (for example), any follow up questions (which pop up after you have marked a point) should either be identical, or shouldn't exist (probably safer to go with the second option). @aliburchard thoughts?

ggdhines-zz commented 9 years ago

Could @brian-c comment on this? I realized that to properly deal with this setup we need to redo some of the front end which is obviously not a goal right now. Could we maybe just have something stated in the tutorial about how to handle this?

ggdhines-zz commented 9 years ago

Or maybe different tools (with the same type) can have different questions but - for example, if after clustering, we determine that a marking is a chick, then we only consider the answers to the questions for people who identified the penguin as a chick.

mkosmala commented 9 years ago

Why can't you cluster by tool within each task? I definitely don't want to ask separate questions when I have 8 things to mark, when most of those things won't be seen in a given image. I think @aliburchard has a similar situation with wildebeest direction.

brian-c commented 9 years ago

Agreed, clustering by tool (not type) is the right choice. I get why you did Penguin Watch that way, but I don' think it's right as a general solution.

aliburchard commented 9 years ago

Sorry, but I don’t think this is a good idea at all. I realise that’s how you do it for penguin watch, but I think it’s MUCH much much more generalisable to cluster by individual tool. And, quite frankly, all of the projects I’m building or helping to build need the clustering to be drawn on individual tool types. Happy to talk about this more when back in the UK.

Alexandra Swanson, Ph.D. Ecology & Citizen Science Postdoc The Zooniverse University of Oxford

On 11 Jun 2015, at 02:13, ggdhines notifications@github.com wrote:

Suppose we are running Penguin Watch on Panoptes. There are two different ways we go do the markings for chicks vs. adults (plus all the other options which I'll ignore for now). One is to offer just one marking tool which pops up with a question everytime asking what kind of thing has been marked (adult or chick). The other option is to offer separate chick and adult marking tools. (The second way is from a HCI point of view definitely better and basically how Penguin Watch currently runs.) Suppose we go with the second way. I run my clustering algorithm and come up with clusters, each representing a separate and unique penguin. to do so, I need to first ignore what type each marking is. So I am going to make a decision - if a single marking task contains more than one tool of the same type (e.g. line, circle, square etc.), I will assume that when clustering, I can combine markings from both tools. so if people want to have two marking tools - each marking a point - but they don't want those points to be possibly confused with each other, they will need to use two separate tasks. In addition, if a task has two point marking tools (for example), any follow up questions (which pop up after you have marked a point) should either be identical, or shouldn't exist (probably safer to go with the second option). @aliburchard thoughts?

— Reply to this email directly or view it on GitHub.

ggdhines-zz commented 9 years ago

I think it is essential that we support a Penguin Watch like approach (if clustering could only be done by type, Penguin Watch would have to be done in a Condor Watch setting which just wouldn't work). The interface for Penguin Watch is just superior to Condor Watch. So what I am going to suggest is that once we design the aggregation interface (down the road a way I realize), we will include a box that people can check if they want to "cluster by tool".

ggdhines-zz commented 9 years ago

Just realized that with this option - we should say something in the project creation docs about this option existing. That way people will know they can use a Penguin Watch like approach.

aliburchard commented 9 years ago

Alright, so after chatting about this with @ggdhines and @mrniaboc, I'm starting to think that greg's aggregation approach makes sense -- do the clustering first and then the classification vote. Take the wildebeest watch project, for example. I want users to mark all the wildebeest as moving in one of 8 directions. If we do the clustering first, then we have one point for every wildebeest. Then we look at which of the 8 directions people gave for each wildebeest and take the majority vote.

Otherwise, if we cluster by each tool type independently, then we might have a given wildebeest that is counted multiple times because people disagreed on the direction it was moving in.

If we cluster by each tool type independently, how do we then correct for people using the wrong tool?

@mkosmala @vrooje anyone else who is dealing with marking data?

mkosmala commented 9 years ago

@aliburchard Ah! I understand now. Hmm... I think then, yes, it makes sense for my project to cluster first and then get info on which tool was most-used for each cluster-object.

But I can think of some cases (not in my project) that might be problematic:

What if a project asks the user to mark objects multiple times with different tools? For example, I want users to mark ALL the trees with green and then mark maple trees with orange. Then the clustering would work well, but I wouldn't necessarily want a consensus for each clustered-object; I would actually want the votes separately for each tool.
What if users are asked to mark overlapping objects with different tools. For example, draw green rectangles around whatzits and blue rectangles around thingamajigs. But whatzits and thingamajigs are all jumbled together. I imagine that in this case clustering by tool might give you better clustering results, whereas doing all rectangles together might give messier clustering results. (But I might be wrong.)

aliburchard commented 9 years ago

@mkosmala yesssss, excellent point. What we were thinking is perhaps allowing project builders to specify whether they wanted clustering first followed by majority vote for the tool type, or whether they wanted clustering by tool only with no majority vote (assume people choose tools perfectly). The code for aggregating the latter is pretty straightforward -- not sure how hard it is to wire up the option though.

ggdhines-zz commented 9 years ago

@mkosmala - for your first use case, I think that would require me (the aggregation code) giving you the raw votes in addition to the consensus. That's an interesting use case I hadn't thought of before, will have to look into it. For your second use case, it might be best for those markings to split up into different tasks. Not prehaps the best approach - I'm definitely open to others.

mkosmala commented 9 years ago

@ggdhines Yes, for the second case, it probably would be better to split up into different tasks. As the front-end is currently implemented, it's actually fairly difficult to draw overlapping shapes -- and almost impossible to delete the one you want to when they are overlapping.

mkosmala commented 9 years ago

And, of course, this brings up the "best practices" concept, which I think someone has started an issue on somewhere. @aliburchard? I've drunk the Kool-aid, so I already know what many of Zooniverse's best practices are. But new project creators almost certainly won't. This might be something to add.

ggdhines-zz commented 9 years ago

@mkosmala - yeah listing this in the best practice document somewhere is important

aliburchard commented 9 years ago

@mkosmala @ggdhines alrighty, I'm starting a best practices google doc here: https://docs.google.com/a/zooniverse.org/document/d/1JjIp06weeOlxQZqCdOuTXTPERKFBYfo0Xs2NIij8gu0/edit?usp=sharing

You may need to request access, but please use it as a place to braindump all possible best practices that project builders should be aware of. We can tidy it up later.

chrissnyder commented 9 years ago

Tagging this as first release, but I'm not sure that the output from this issue is. Does any of this have any direct ramifications for the front end?

ggdhines-zz commented 9 years ago

@chrissnyder - I think the affect is that something should be listed in the project creation page explaining about how markings are going to be clustered - so people understand how things work. @aliburchard this seem reasonable?

chrissnyder commented 9 years ago

Ok fair enough.

That said, for launch are we actually planning on having aggregations fully integrated with the platform? I'm out of the loop on where the different pieces are. Should we move this to the future plans milestone?

aliburchard commented 9 years ago

@chrissnyder not sure what the timeline for integrating aggregation is - my impression is that it won't be fully integrated with the platform in time for launch.

@ggdhines I think we definitely should have a best practices doc and explanation of this, but that it will be in a separate location -- we don't have space for much more in-line help text, and it's sort of the nitty gritty.

srallen commented 7 years ago

Given that we'll be hiding the aggregation button since it's unreliable, I'm going to close this.

zooniverse / Panoptes-Front-End

issue with markings #525