Closed srallen closed 4 years ago
RHC: A right handed coordinate system, this is defined as a system where positive angles rotate an object from the +x axis to the +y axis with angle=0
along the +x axis
LHC: A left handed coordinate system, this is defined as a system where positive angles rotate an object from the +x axis to the -y axis with angle=0
along the +x axis
Domain: The range of values a number can take [
or ]
is inclusive, (
or )
is exclusive.
Upper origin: The point 0, 0
is in the upper left of the plot
Lower origin: The point 0, 0
is in the lower left of the plot
The SVG coordinate system used by the the current front-end is a RHC and has an upper origin, this results in positive angles rotating clockwise.
Most plotting software (R, Python, Matlab) are RHC with a lower origin, this results in positive angles rotating counter-clockwise.
Both cases above have RHC but a positive angle rotates in opposite directions, this is why the visible direction of rotation should not be the only thing used to figure out what direction the angle should go. The position of the origin matters too.
For this list I am only concerned about tools with a specified shape (i.e. no free-hand tools or polygons)
x
: Center x
positiony
: Center y
positionr
: Radiusx
: Left most line of the columnwidth
: Full width of the columnx
: Center x
positiony
: Center y
positionrx
: Radius along the x-axis (before rotation)ry
: Radius along the y-axis (before rotation)angle
: Degrees, LHC, domain (-180, 180]
y
: y
position of the linex
: x
position of the linex1
: Start point x
positiony1
: Start point y
positionx2
: End point x
positiony2
: End point y
positionx
: x
position of the pointy
: y
position of the pointx
: x
position of the vertex closets to the originy
: y
position of the vertex closets to the originwidth
: Full width of the rectangle (along +x axis)height
: Full height of the rectangle (along +y axis)x
: x
position of the vertex closets to the origin before rotationy
: y
position of the vertex closets to the origin before rotationwidth
: Full width of the rectangle (along +x axis) before rotationheight
: Full height of the rectangle (along +y axis) before rotationangle
: Degrees, RHC, domain (-180, 180]
x, y
point given.x
: Center x
positiony
: Center y
positionr
: Radiusangle
: Degrees, LHC-ish, domain (-180, 180]
angle=0
is along the -y axis (not standard in any way)x
: x
position of the cone's pointy
: y
position of the cone's pointradius
: Distance from the point to the center of the arcspread
: Full opening angle (in degrees)rotation
: Degrees, RHC, domain [0, 360)
(-180, 180]
or [0, 360)
angle
and rotation
are used now)x
and y
should be consistently defined as the center of a shape (with exception of the fan
tool as this is not symmetric). All shape clustering needs to be done with the center anyways, so this saves a set when analyzing the data.
x
and y
could be renamed x_center
and y_center
to make this clear to the science teamsx
and y
(or about the center if these all change to be the center point)
I'd also like to suggest that thought goes into removing any default values from the tools - i.e. the ellipse has a default axis ratio of 0.5. I've noticed in Galaxy Builder data that a lot of volunteers were content to leave this axis ratio at the default, and it's a bias I'm struggling to account for. For example, this plot shows the axis ratio of my "best" classification for each subject, against another measure of galaxy ellipticity, and the line of classifications at 0.5 shows the bias present.
A way of fixing this in UX could be to have people click to select a centre position, then drag out one of the axis (setting the rotation and one measure of radius), then move their mouse perpendicular to the original line to select the second radius.
I completely understand rewriting the drawing tool UX is a large endeavour, but I think it's important to remove biases wherever possible for the science the Zooniverse wants to do.
@tingard Thank you for the information about the radial defaults and this is very helpful. We'll use this information when we get to the shapes using radii.
There will be a similar bias for all tasks that have default values. Off the top of my head these currently are:
On galaxy builder we see clear biases for the ellipse and slider tasks.
The fan tool also defaults the spread
value.
Feedback about the current freehand drawing tool:
I would love to help with the Etch a cell project, and have done a couple of pages, but the pages keep dropping out saying that they are using significant memory… I am not prepared to overload my computer for this - sorry!
We need to keep performance in mind as we implement the new tools.
Mentioning this here although not directly related as suggested by @eatyourgreens .
The other day I was doing a mammoth classifying session to try and generate some gold standard labels for one of our projects. In the end I did ~1000 classifications. Some way in, Safari popped up its little banner telling me that the page was consuming a lot of memory and eventually it just reloaded the page citing excessive memory consumption as the reason. The project was Galaxy Zoo: Clump Scout which just has 400x400 pngs as subjects and uses the point marking tool. I was providing at most 6 marks per subject. Maybe there is also an outstanding performance issue with existing PFE tools.
My best guess for the first place to look would be memory leaks from event handlers that haven't been destroyed properly.
Noting for clarity that tasks create annotations but drawing tools create marks. A drawing task annotation is a collection of marks. Tools don't create annotations in this model.
[180, -180]
ie. the PFE tool but with angle
multiplied by -1.Did we decide on a preferred standard for angles?
(x, y, angle)
common, optional properties for all marks, defining the position of the mark and rotation about that position.@eatyourgreens for analysis research teams will expect RHC in the domain [180, -180]
as it is the standard used by all plotting software. I would suggest we stick to the same standard when making our tools (thankfully this is also the SVG standard, so no transforms needs to be done to use it).
@CKrawczyk great, that's what I've assumed in #1360. All tools use the same rotation and translation transform, rather than each tool defining its own transform and eg. multiplying angles by -1.
I think we're at a place that this RFC can become an ADR.
By the way, the way I've set up #1360 is that translate (${mark.x} ${mark.y) rotate (${mark.angle})
is applied to all SVG marks, but marks are not required to specify (x, y, angle)
.
We can also use types.refinement()
to specify additional constraints on the Mark model eg. 180 >= mark.angle >= -180
.
@CKrawczyk do you have any thoughts on how task annotations should be added to the mark models for subtasks? at the moment, PFE posts back mark.details
, which I think is an array of task annotations but those annotations don't have task keys. For example, the details
array in #1314 doesn't have a key pointing to a text task, just an annotation value for each item.
Lets look at an example of a point task that has some subtasks, a question and some dropdowns. In the current interface an annotation looks like.
{"annottions": [
{
"task": "T0",
"value": [
{
"frame": 0,
"tool": 0,
"x": 452.18341064453125,
"y": 202.87478637695312,
"value_index": 0,
"details": [
{"value": 0},
{"value": [
{"option-1": 1},
{"option-2": 1},
{"None": 1}
]}
]
},
{
"frame": 0,
"tool": 1,
"x": 404.61279296875,
"y": 583.4398803710938,
"value_index": 1,
"details": [
{"value": 1},
{"value": [
{"option-3": 1},
{"option-4": 1},
{"option-5": 1}
]}
]
}
]
}
]}
As noted above there is no indication for what kind of tasks are inside the details
blocks. If the subtasks are going to have task names associated with them I think we can get away with a flatter structure along the lines of:
{"annottions": [
{
"task": "T0",
"value": [
{
"frame": 0,
"tool": 0,
"x": 452.18341064453125,
"y": 202.87478637695312,
"value_index": 0,
"details": [
{"task": "T1"},
{"task": "T2"}
]
},
{
"frame": 0,
"tool": 1,
"x": 404.61279296875,
"y": 583.4398803710938,
"value_index": 1,
"details": [
{"task": "T1"},
{"task": "T2"}
]
}
]
},
{
"task": "T1",
"associated_task": "T0",
"associated_value_index": 0,
"value": 0
},
{
"task": "T2",
"associated_task": "T0",
"associated_value_index": 0,
"value":[
{"option-1": 1},
{"option-2": 1},
{"None": 1}
]
},
{
"task": "T1",
"associated_task": "T0",
"associated_value_index": 1,
"value": 1
},
{
"task": "T2",
"associated_task": "T0",
"associated_value_index": 1,
"value": [
{"option-3": 1},
{"option-4": 1},
{"option-5": 1}
]
}
]}
This way all the subtask annotations are stored at the same level within the structure. The important thing is to make sure the drawn points are indexed (value_index
in this example) and the subtask annotations carry this index and the task name they match up to (associated_index
and associated_task
in this example). This will ensure that the subtask annotations can be matched up with the point they belong to (or vise-versa).
I opted for a named index value so the structure does not rely on the order of the points within the annotation list (e.g. in case a volunteer deletes some of the points along the way and change the order of the annotations).
When aggregating drawn shapes the first step is clustering. Once the clusters are found the subtasks need to be aggregated within each cluster. This is easy to do if the structure of each subtask annotation is the same as if that task was asked on its own. The code can just take all subtask annotations within a cluster and just pass it to the reducer as if it is a list of main task annotations without having to reshape them.
At the moment I need to parse the details
list and make a "mock annotation" of the correct structure to be passed along to the next reducer.
Thanks, that's very useful. The subtasks don't have identifiers, but will need them to fit into the annotation model we're using in the classifier. I've been thinking about how to generate task identifiers based on the Panoptes workflow task structure. Maybe something like TASK_KEY.TOOL_INDEX.DETAILS_INDEX
eg. { task: 'T0.0.0' }
would point to workflow.tasks.T0.tools[0].details[0]
.
The other option would be to change the format of tasks in the project builder, and give subtasks identifiers as workflow tasks in their own right. That's potentially a breaking change for projects that are using drawing with subtasks.
@eatyourgreens a structure like that would work too as the actual task name is not important.
Thanks. At the moment the classifier is set up so that we store mark.annotations
internally for each mark, which uses exactly the same model as classification.annotations
, so [{ task, value }, { task, value }]
.
It shouldn't be a problem to map that to a flatter structure on classification submission. We're going to have to do some conversion anyway since the annotations are stored internally as maps.
Does Caesar have strong opinions as to whether the serialised attribute should be mark.details
or mark.annotations
? Renaming it should be trivial.
On second thoughts, looking at your flatter structure, the repeated task keys for T1
and T2
would be a problem for the current classifier code since the classifier uses a map indexed by task key ie. each task key can only appear in one annotation at present.
I don't think Caesar cares if it is mark.details
or mark.annotations
, but the external shape extractors (aggregation-for-caesar
) currently uses mark.details
. That being said the aggregation extractor code needs to be re-written for to account for the structure changes anyways, so it can be changed if needed.
The annotation map will have to change to an array anyway to support recursion, so that shouldn't be a blocker.
I like the idea of flattening versus the nested structure as it will be consistent with the reconceptualization we've made with workflow steps. We've removed the nested structure for combo tasks and we can do that here too with drawing sub-tasks where details
is just an array of unique task identifier keys. The addition of associated_task
and associated_task_index
potentially makes this structure flexible for any task to have an associated task (though I'm not sure why we would want to do that).
The annotation map will have to change to an array anyway to support recursion, so that shouldn't be a blocker.
That should be a case of changing the type in the AnnotationsStore
model and updating anything that uses annotations.put()
, which should only be the one line in the addAnnotation
action. Maybe also updating anything that deletes entries from annotations
, but I think array.remove(item)
is supported in MST so that shouldn't be a problem.
Yep, we have an issue tracking this, so please add any notes on how to do that here: https://github.com/zooniverse/front-end-monorepo/issues/1226
Note that workflow steps have a tasks
property in the classifier model, for type validation and rendering the Tasks component. I'm going to use tool.tasks
the same way. workflow.tasks[TASK_KEY].tools[TOOL_INDEX].details
maps to an array of Task models, which are stored on tool.tasks
and used to render the subtask form for marks from that tool (tool.marks
).
We should be clear about what we're discussing here, the internal store models vs the model at submission and something I've had trouble following in these discussions. Can we keep this issue scoped to the model at classification submission and used downstream and have notes about internal store structure in a separate issue to keep it straight?
Off the top of my head, I think projects that would be affected by this change are PRN, Penguin Watch and NfN. Maybe we should run the proposed classification format past them to make sure they're ok with it?
T0.0.0
so TASK_KEY.TOOL_INDEX.DETAIL_INDEX
.I think projects that would be affected by this change are PRN, Penguin Watch and NfN. Maybe we should run the proposed classification format past them to make sure they're ok with it?
I'd rather not open this up to committee. I'd prefer we decide first what we'd like to do internally, then write a proposal ADR and we can invite specific research experienced individuals on our team to give input from a research perspective as was done with the ADR for the transcription task.
Sounds good to me.
I opted for a named index value so the structure does not rely on the order of the points within the annotation list (e.g. in case a volunteer deletes some of the points along the way and change the order of the annotations).
By the way, drawing annotations aren't generated until a volunteer presses Next or Done to complete the task, so deleting marks can't change the order of an annotation value (see ADR-15).
EDIT: marks have IDs, mark.id
, in the Mark model so those could also be used to link a task annotation back to its parent mark.
Can this be closed in favour of #1390?
No because there are still comments in this thread about the sub-task and I'd like to write an ADR for that.
Package lib-classifier
Feature or Issue Description The MobX stores will have models defined for these expected drawing marks, however, we need to check in with @CKrawczyk since the current tools are inconsistent or not always in an optimal format for aggregation. We should discuss this and document in an ADR any changes for these tools.
To Dos