srallen commented 5 years ago

Package lib-classifier

Feature or Issue Description The MobX stores will have models defined for these expected drawing marks, however, we need to check in with @CKrawczyk since the current tools are inconsistent or not always in an optimal format for aggregation. We should discuss this and document in an ADR any changes for these tools.

To Dos

[ ] Example

CKrawczyk commented 5 years ago

Overview of the current drawing tools.

Definitions

RHC: A right handed coordinate system, this is defined as a system where positive angles rotate an object from the +x axis to the +y axis with angle=0 along the +x axis

LHC: A left handed coordinate system, this is defined as a system where positive angles rotate an object from the +x axis to the -y axis with angle=0 along the +x axis

Domain: The range of values a number can take [ or ] is inclusive, ( or ) is exclusive.

Upper origin: The point 0, 0 is in the upper left of the plot

Lower origin: The point 0, 0 is in the lower left of the plot

Notes about coordinate systems

The SVG coordinate system used by the the current front-end is a RHC and has an upper origin, this results in positive angles rotating clockwise.

Most plotting software (R, Python, Matlab) are RHC with a lower origin, this results in positive angles rotating counter-clockwise.

Both cases above have RHC but a positive angle rotates in opposite directions, this is why the visible direction of rotation should not be the only thing used to figure out what direction the angle should go. The position of the origin matters too.

The current drawing tools

For this list I am only concerned about tools with a specified shape (i.e. no free-hand tools or polygons)

Circle

x: Center x position
y: Center y position
r: Radius

Column

x: Left most line of the column
width: Full width of the column

Ellipse

x: Center x position
y: Center y position
rx: Radius along the x-axis (before rotation)
ry: Radius along the y-axis (before rotation)
angle: Degrees, LHC, domain (-180, 180]

fullWidthLine

y: y position of the line

fullHeightLine

x: x position of the line

line

x1: Start point x position
y1: Start point y position
x2: End point x position
y2: End point y position

point

x: x position of the point
y: y position of the point

rectangle

x: x position of the vertex closets to the origin
y: y position of the vertex closets to the origin
width: Full width of the rectangle (along +x axis)
height: Full height of the rectangle (along +y axis)

rotateRectangle

x: x position of the vertex closets to the origin before rotation
y: y position of the vertex closets to the origin before rotation
width: Full width of the rectangle (along +x axis) before rotation
height: Full height of the rectangle (along +y axis) before rotation
angle: Degrees, RHC, domain (-180, 180]
Additional note: the rotation is about the center, not the x, y point given.

triangle

x: Center x position
y: Center y position
r: Radius
angle: Degrees, LHC-ish, domain (-180, 180]
Additional note: angle=0 is along the -y axis (not standard in any way)

fan

x: x position of the cone's point
y: y position of the cone's point
radius: Distance from the point to the center of the arc
spread: Full opening angle (in degrees)
rotation: Degrees, RHC, domain [0, 360)

What would make aggregation easier

A consistent definition for how angles are defined. Our current tools are a mix of a bunch of different systems, with only two of them lining up with the expected mathematical standard of a RHC with a domain of (-180, 180] or [0, 360)
- A consistent name for the angle parameter (both angle and rotation are used now)
x and y should be consistently defined as the center of a shape (with exception of the fan tool as this is not symmetric). All shape clustering needs to be done with the center anyways, so this saves a set when analyzing the data.
- x and y could be renamed x_center and y_center to make this clear to the science teams
All rotations should be defined about the point given by x and y (or about the center if these all change to be the center point)
- If it has to be different the point of rotation should be included in the annotation

tingard commented 5 years ago

I'd also like to suggest that thought goes into removing any default values from the tools - i.e. the ellipse has a default axis ratio of 0.5. I've noticed in Galaxy Builder data that a lot of volunteers were content to leave this axis ratio at the default, and it's a bias I'm struggling to account for. For example, this plot shows the axis ratio of my "best" classification for each subject, against another measure of galaxy ellipticity, and the line of classifications at 0.5 shows the bias present.

download

A way of fixing this in UX could be to have people click to select a centre position, then drag out one of the axis (setting the rotation and one measure of radius), then move their mouse perpendicular to the original line to select the second radius.

I completely understand rewriting the drawing tool UX is a large endeavour, but I think it's important to remove biases wherever possible for the science the Zooniverse wants to do.

srallen commented 5 years ago

@tingard Thank you for the information about the radial defaults and this is very helpful. We'll use this information when we get to the shapes using radii.

CKrawczyk commented 5 years ago

There will be a similar bias for all tasks that have default values. Off the top of my head these currently are:

angle of the rotation rectangle (defaults to 0 deg)
aspect ratio of the ellipse (defaults to 0.5)
the slider task (default picked by the researcher)

On galaxy builder we see clear biases for the ellipse and slider tasks.

CKrawczyk commented 5 years ago

The fan tool also defaults the spread value.

srallen commented 5 years ago

Feedback about the current freehand drawing tool:

I would love to help with the Etch a cell project, and have done a couple of pages, but the pages keep dropping out saying that they are using significant memory… I am not prepared to overload my computer for this - sorry!

We need to keep performance in mind as we implement the new tools.

hughdickinson commented 4 years ago

Mentioning this here although not directly related as suggested by @eatyourgreens .

The other day I was doing a mammoth classifying session to try and generate some gold standard labels for one of our projects. In the end I did ~1000 classifications. Some way in, Safari popped up its little banner telling me that the page was consuming a lot of memory and eventually it just reloaded the page citing excessive memory consumption as the reason. The project was Galaxy Zoo: Clump Scout which just has 400x400 pngs as subjects and uses the point marking tool. I was providing at most 6 marks per subject. Maybe there is also an outstanding performance issue with existing PFE tools.

eatyourgreens commented 4 years ago

My best guess for the first place to look would be memory leaks from event handlers that haven't been destroyed properly.

eatyourgreens commented 4 years ago

Noting for clarity that tasks create annotations but drawing tools create marks. A drawing task annotation is a collection of marks. Tools don't create annotations in this model.

eatyourgreens commented 4 years ago

1360 adds the ellipse tool with RHC rotation angles in the domain `[180, -180]` ie. the PFE tool but with `angle` multiplied by -1.

Did we decide on a preferred standard for angles?

eatyourgreens commented 4 years ago

1360 makes `(x, y, angle)` common, optional properties for all marks, defining the position of the mark and rotation about that position.

CKrawczyk commented 4 years ago

@eatyourgreens for analysis research teams will expect RHC in the domain [180, -180] as it is the standard used by all plotting software. I would suggest we stick to the same standard when making our tools (thankfully this is also the SVG standard, so no transforms needs to be done to use it).

eatyourgreens commented 4 years ago

@CKrawczyk great, that's what I've assumed in #1360. All tools use the same rotation and translation transform, rather than each tool defining its own transform and eg. multiplying angles by -1.

srallen commented 4 years ago

I think we're at a place that this RFC can become an ADR.

eatyourgreens commented 4 years ago

By the way, the way I've set up #1360 is that translate (${mark.x} ${mark.y) rotate (${mark.angle}) is applied to all SVG marks, but marks are not required to specify (x, y, angle).

We can also use types.refinement() to specify additional constraints on the Mark model eg. 180 >= mark.angle >= -180.

eatyourgreens commented 4 years ago

@CKrawczyk do you have any thoughts on how task annotations should be added to the mark models for subtasks? at the moment, PFE posts back mark.details, which I think is an array of task annotations but those annotations don't have task keys. For example, the details array in #1314 doesn't have a key pointing to a text task, just an annotation value for each item.

CKrawczyk commented 4 years ago

Lets look at an example of a point task that has some subtasks, a question and some dropdowns. In the current interface an annotation looks like.

What we currently have

{"annottions": [
    {
        "task": "T0",
        "value": [
            {
                "frame": 0,
                "tool": 0,
                "x": 452.18341064453125,
                "y": 202.87478637695312,
                "value_index": 0,
                "details": [
                    {"value": 0},
                    {"value": [
                        {"option-1": 1},
                        {"option-2": 1},
                        {"None": 1}
                    ]}
                ]
            },
            {
                "frame": 0,
                "tool": 1,
                "x": 404.61279296875,
                "y": 583.4398803710938,
                "value_index": 1,
                "details": [
                    {"value": 1},
                    {"value": [
                        {"option-3": 1},
                        {"option-4": 1},
                        {"option-5": 1}
                    ]}
                ]
            }
        ]
    }
]}

What it could look like

As noted above there is no indication for what kind of tasks are inside the details blocks. If the subtasks are going to have task names associated with them I think we can get away with a flatter structure along the lines of:

{"annottions": [
    {
        "task": "T0",
        "value": [
            {
                "frame": 0,
                "tool": 0,
                "x": 452.18341064453125,
                "y": 202.87478637695312,
                "value_index": 0,
                "details": [
                    {"task": "T1"},
                    {"task": "T2"}
                ]
            },
            {
                "frame": 0,
                "tool": 1,
                "x": 404.61279296875,
                "y": 583.4398803710938,
                "value_index": 1,
                "details": [
                    {"task": "T1"},
                    {"task": "T2"}
                ]
            }
        ]
    },
    {
        "task": "T1",
        "associated_task": "T0",
        "associated_value_index": 0,
        "value": 0
    },
    {
        "task": "T2",
        "associated_task": "T0",
        "associated_value_index": 0,
        "value":[
            {"option-1": 1},
            {"option-2": 1},
            {"None": 1}
        ]
    },
    {
        "task": "T1",
        "associated_task": "T0",
        "associated_value_index": 1,
        "value": 1
    },
    {
        "task": "T2",
        "associated_task": "T0",
        "associated_value_index": 1,
        "value": [
            {"option-3": 1},
            {"option-4": 1},
            {"option-5": 1}
        ]
    }
]}

This way all the subtask annotations are stored at the same level within the structure. The important thing is to make sure the drawn points are indexed (value_index in this example) and the subtask annotations carry this index and the task name they match up to (associated_index and associated_task in this example). This will ensure that the subtask annotations can be matched up with the point they belong to (or vise-versa).

I opted for a named index value so the structure does not rely on the order of the points within the annotation list (e.g. in case a volunteer deletes some of the points along the way and change the order of the annotations).

Why this structure would be helpful

When aggregating drawn shapes the first step is clustering. Once the clusters are found the subtasks need to be aggregated within each cluster. This is easy to do if the structure of each subtask annotation is the same as if that task was asked on its own. The code can just take all subtask annotations within a cluster and just pass it to the reducer as if it is a list of main task annotations without having to reshape them.

At the moment I need to parse the details list and make a "mock annotation" of the correct structure to be passed along to the next reducer.

eatyourgreens commented 4 years ago

Thanks, that's very useful. The subtasks don't have identifiers, but will need them to fit into the annotation model we're using in the classifier. I've been thinking about how to generate task identifiers based on the Panoptes workflow task structure. Maybe something like TASK_KEY.TOOL_INDEX.DETAILS_INDEX eg. { task: 'T0.0.0' } would point to workflow.tasks.T0.tools[0].details[0].

The other option would be to change the format of tasks in the project builder, and give subtasks identifiers as workflow tasks in their own right. That's potentially a breaking change for projects that are using drawing with subtasks.

CKrawczyk commented 4 years ago

@eatyourgreens a structure like that would work too as the actual task name is not important.

eatyourgreens commented 4 years ago

Thanks. At the moment the classifier is set up so that we store mark.annotations internally for each mark, which uses exactly the same model as classification.annotations, so [{ task, value }, { task, value }].

It shouldn't be a problem to map that to a flatter structure on classification submission. We're going to have to do some conversion anyway since the annotations are stored internally as maps.

Does Caesar have strong opinions as to whether the serialised attribute should be mark.details or mark.annotations? Renaming it should be trivial.

eatyourgreens commented 4 years ago

On second thoughts, looking at your flatter structure, the repeated task keys for T1 and T2 would be a problem for the current classifier code since the classifier uses a map indexed by task key ie. each task key can only appear in one annotation at present.

CKrawczyk commented 4 years ago

I don't think Caesar cares if it is mark.details or mark.annotations, but the external shape extractors (aggregation-for-caesar) currently uses mark.details. That being said the aggregation extractor code needs to be re-written for to account for the structure changes anyways, so it can be changed if needed.

srallen commented 4 years ago

The annotation map will have to change to an array anyway to support recursion, so that shouldn't be a blocker.

srallen commented 4 years ago

I like the idea of flattening versus the nested structure as it will be consistent with the reconceptualization we've made with workflow steps. We've removed the nested structure for combo tasks and we can do that here too with drawing sub-tasks where details is just an array of unique task identifier keys. The addition of associated_task and associated_task_index potentially makes this structure flexible for any task to have an associated task (though I'm not sure why we would want to do that).

eatyourgreens commented 4 years ago

The annotation map will have to change to an array anyway to support recursion, so that shouldn't be a blocker.

That should be a case of changing the type in the AnnotationsStore model and updating anything that uses annotations.put(), which should only be the one line in the addAnnotation action. Maybe also updating anything that deletes entries from annotations, but I think array.remove(item) is supported in MST so that shouldn't be a problem.

srallen commented 4 years ago

Yep, we have an issue tracking this, so please add any notes on how to do that here: https://github.com/zooniverse/front-end-monorepo/issues/1226

eatyourgreens commented 4 years ago

Note that workflow steps have a tasks property in the classifier model, for type validation and rendering the Tasks component. I'm going to use tool.tasks the same way. workflow.tasks[TASK_KEY].tools[TOOL_INDEX].details maps to an array of Task models, which are stored on tool.tasks and used to render the subtask form for marks from that tool (tool.marks).

srallen commented 4 years ago

We should be clear about what we're discussing here, the internal store models vs the model at submission and something I've had trouble following in these discussions. Can we keep this issue scoped to the model at classification submission and used downstream and have notes about internal store structure in a separate issue to keep it straight?

eatyourgreens commented 4 years ago

Off the top of my head, I think projects that would be affected by this change are PRN, Penguin Watch and NfN. Maybe we should run the proposed classification format past them to make sure they're ok with it?

eatyourgreens commented 4 years ago

1376 assigns task keys to subtasks of the form `T0.0.0` so `TASK_KEY.TOOL_INDEX.DETAIL_INDEX`.

srallen commented 4 years ago

I think projects that would be affected by this change are PRN, Penguin Watch and NfN. Maybe we should run the proposed classification format past them to make sure they're ok with it?

I'd rather not open this up to committee. I'd prefer we decide first what we'd like to do internally, then write a proposal ADR and we can invite specific research experienced individuals on our team to give input from a research perspective as was done with the ADR for the transcription task.

eatyourgreens commented 4 years ago

Sounds good to me.

eatyourgreens commented 4 years ago

I opted for a named index value so the structure does not rely on the order of the points within the annotation list (e.g. in case a volunteer deletes some of the points along the way and change the order of the annotations).

By the way, drawing annotations aren't generated until a volunteer presses Next or Done to complete the task, so deleting marks can't change the order of an annotation value (see ADR-15).

EDIT: marks have IDs, mark.id, in the Mark model so those could also be used to link a task annotation back to its parent mark.

eatyourgreens commented 4 years ago

Can this be closed in favour of #1390?

srallen commented 4 years ago

No because there are still comments in this thread about the sub-task and I'd like to write an ADR for that.

zooniverse / front-end-monorepo

[RFC] Drawing tools mark models #500

Overview of the current drawing tools.

Definitions

Notes about coordinate systems