zooniverse / front-end-monorepo

A rebuild of the front-end for zooniverse.org
https://www.zooniverse.org
Apache License 2.0
105 stars 30 forks source link

Transcription task: previously transcribed lines can be transcribed multiple times #2082

Closed eatyourgreens closed 9 months ago

eatyourgreens commented 3 years ago

Package

lib-classifier

Describe the bug

After creating a green transcription line from a magenta, previously transcribed line, I can still interact with the magenta line and create new transcriptions.

To Reproduce

There are a couple of different ways to exploit this bug.

First: Create a green line from a magenta line, then drag the green line away from the magenta line (see #1836.) You can now click on the magenta line again, to create a second green line.

Second: Create a green line from a magenta line. Without moving the green line, tab or shift-tab back to the original magenta line. With the magenta line focussed, press Enter or Space to create a new green line.

Expected behavior

I'd expect magenta lines to either be replaced by green lines, or to remain but be disabled so that I can no longer interact with them (no onClick handler and tabindex reset to -1.)

eatyourgreens commented 2 years ago

Re-opening this because I can create multiple new lines for each previously transcribed line on Davy Notebooks: https://www.zooniverse.org/projects/humphrydavy/davy-notebooks-project/classify/workflow/21734

eatyourgreens commented 2 years ago

Opening this again, because it still affects Poets & Lovers.

eatyourgreens commented 2 years ago

The latest version of this bug is happening because there can be more than one pink line for each previous transcription in the response from Caesar eg. here there are three copies of each line, rather than one line with three text options to choose from:

Screenshot of the consensusLines array, showing multiple lines with the same text and coordinates.
eatyourgreens commented 2 years ago

This seems to be a bug in Caesar (or Tove?) The response from Caesar can continue multiple lines with the same coordinates and consensus text. The transcription task simply takes each line from the Caesar response and renders it on the page, without checking if there's an existing line with the same coordinates and text.

eatyourgreens commented 2 years ago

Pinging @CKrawczyk because this seems to be something odd in the Caesar response: reductions with the same coordinates and consensus text but different user IDs. The frontend code is expecting those to be clustered into single lines with multiple user IDs.

CKrawczyk commented 2 years ago

@eatyourgreens can you provide the workflow and subject ID for an example? I will take a closer look at what the reducer is doing, if the line endpoints are the same it should be combining them into a single reduced line.

eatyourgreens commented 2 years ago

@CKrawczyk I just found this on Poets & Lovers: a grey line (completed line) and purple line (incomplete line) on top of each other. The grey line appears after I've edited the pink line, which disables it and removes the event handlers.

https://frontend.preview.zooniverse.org/projects/pmlogan/poets-and-lovers/classify/workflow/21362/subject-set/104805

The subject ID is 76124471, from the browser console.

Screenshot of the 'Previous Transcriptions' popup being shown for a line because a pink line and a grey line are being rendered in the same position, with the same text.
eatyourgreens commented 2 years ago

I was able to reproduce this on almost any page from Poets & Lovers, by the way, so it should be present on almost any subject linked to that set or workflow.

CKrawczyk commented 2 years ago

I grabbed the reduction for this subject from caesar using https://caesar.zooniverse.org/workflows/21362/subjects/76124471 and there are no overlaps. This matches the graphiQL query for the subject:

{
  workflow(id: 21362) {
    reductions(subjectId: 76124471, reducerKey: "alice") {
      data
    }
  }
}

Is there any kind of postprocessing that is done on the reduction data before it is displayed? Are the extracts used as well to pull individual lines? It could be that the two are being mangled during that matching process leading to the bug.

If it helps with debugging here is the JSON for that subjects reductions in and easier to copy/paste formate:

[{"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[35.11451506614685, 661.8904113173485], "clusters_y":[80.277707695961, 73.25500857830048], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["of", "of", "of", "of"], ["the", "the", "the", "the"], ["figure", "figure", "figure", "figure"], ["snit", "snit", "suit", "snit"], ["our", "our", "our", "our"], ["mood.", "mood.", "mood.", "mood."], ["We", "We", "We", "We"], ["grew", "grew", "grow", "grew"], ["very", "very", "very", "very"], ["sad,", "sad,", "sad,", "sad,"], ["very", "very", "very", "very"]], "extract_index":[0, 1, 0, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"of the figure snit our mood. We grew very sad, very", "consensus_score":3.8181818181818183},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[42.13721418380737, 651.3563626408577], "clusters_y":[105.4423810839653, 102.51625645160675], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["forsaken", "forsaken", "forsaken", "forsaken"], ["among", "among", "among", "among"], ["the", "the", "the", "the"], ["lovely", "lovely", "lovely", "lovely"], ["Waltean", "Waltean", "Waltean", "Waltean"], ["drawings", "drawings", "drawings", "drawings"], ["-", "-", "-", "-"]], "extract_index":[1, 2, 2, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"forsaken among the lovely Waltean drawings -", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[43.30766403675079, 646.674563229084], "clusters_y":[137.04452866315842, 128.26615476608276], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["their", "their", "their", "their"], ["fragile", "fragile", "fragile", "fragile"], ["mirth", "mirth", "mirth", "mirth"], ["makes", "makes", "makes", "makes"], ["us", "us", "us", "us"], ["sick", "sick", "sick", "sick"], ["at", "at", "at", "at"], ["heart", "heart", "heart", "heart"]], "extract_index":[2, 3, 3, 1], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"their fragile mirth makes us sick at heart", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[36.87018984556198, 644.9188884496689], "clusters_y":[165.13531905412674, 157.5273950099945], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["We", "We", "We", "We"], ["face", "face", "face", "face"], ["the", "the", "the", "the"], ["sun", "sun", "sun", "sun"], ["again,", "again,", "again,", "again,"], ["take", "take", "take", "take"], ["a", "a", "a", "a"], ["carriage", "carriage", "carriage", "carriage"], ["&", "&", "&", "&"], ["drive", "drive", "drive", "drive"]], "extract_index":[3, 4, 4, 2], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"We face the sun again, take a carriage & drive", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[45.063338816165924, 656.0381620526314], "clusters_y":[200.24881619215012, 205.5158405303955], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["to", "to", "to", "to", "to", "to", "to"], ["avoid", "avoid", "avoid", "avoid", "avoid", "avoid", "avoid"], ["[unclear][/unclear].", "[unclear][/unclear].", "sunstroke.", "sunstroke.", "sunstroke.", "sunstroke.", "sunstroke."], ["Only", "Only", "Only", "Only", "Only", "Only", "Only"], ["Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["is", "is", "is", "is", "is", "is", "is"], ["in", "in", "in", "in", "in", "in", "in"], ["the", "the", "the", "the", "the", "the", "the"], ["Salon", "Salon", "Salon", "Salon", "Salon", "Salon", "Salon"]], "extract_index":[4, 5, 5, 1, 0, 11, 0], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"to avoid sunstroke. Only Bernhard is in the Salon", "consensus_score":6.777777777777778},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[47.989463448524475, 654.2824872732162], "clusters_y":[229.5100640654564, 231.26573884487152], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["reading", "reading", "reading", "reading"], ["profoundly;", "profoundly;", "profoundly;", "profoundly;"], ["Mary", "Mary", "Mary", "Mary"], ["is", "is", "is", "is"], ["lying", "lying", "lying", "lying"], ["down", "down", "down", "down"]], "extract_index":[5, 6, 6, 3], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"reading profoundly; Mary is lying down", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[49.159913301467896, 458.23213690519333], "clusters_y":[259.3565368652344, 265.2087861299515], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["in", "in", "in", "in", "in", "in", "in"], ["her", "her", "her", "her", "her", "her", "her"], ["room", "room", "room", "room", "room", "room", "room"], ["-", "-", "-", "-", "-", "-", "-"], ["[unclear][/unclear]", "[unclear][/unclear]", "sans", "same", "sans", "sans", "sans"], ["anything!", "anything!", "anything!", "anything!", "anything!", "anything!", "anything!"]], "extract_index":[6, 7, 7, 2, 2, 10, 1], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"in her room - sans anything!", "consensus_score":6.5},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[35.69973999261856, 292.02825778722763], "clusters_y":[292.7143592238426, 291.5439093708992], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["[underline]Wednesday", "[underline]Wednesday", "[underline]Wednesday", "[underline]Wednesday"], ["29", "29", "29", "29"], ["June", "June", "June", "June"], ["29.[/underline]", "29.[/underline]", "29.[/underline]", "29.[/underline]"]], "extract_index":[7, 8, 8, 4], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"[underline]Wednesday 29 June 29.[/underline]", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989], "clusters_x":[43.30766403675079, 653.6972623467445], "clusters_y":[321.97559946775436, 317.2938000559807], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["Again", "Again", "Again", "Again", "Again", "Again"], ["the", "the", "we", "we", "we", "the"], ["story", "story", "stay", "stay", "stay", "story"], ["", "", "with", "with", "with", ""], ["d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare."], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["says", "says", "says", "says", "says", "says"], ["he", "he", "he", "he", "he", "he"]], "extract_index":[8, 9, 9, 3, 3, 9], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"Again the story with d'Esclare. Mary says he", "consensus_score":4.875},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[43.30766403675079, 648.4302380084991], "clusters_y":[354.7481968998909, 350.0663974881172], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["is", "is", "is", "is"], ["to", "to", "to", "to"], ["her", "her", "her", "her"], ["the", "the", "the", "the"], ["type", "type", "type", "type"], ["of", "of", "of", "of"], ["Womanhood,", "Womanhood,", "Womanhood,", "Womanhood,"], ["whelmed", "whelmed", "whelmed", "whelmed"]], "extract_index":[9, 10, 10, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"is to her the type of Womanhood, whelmed", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[42.72243911027908, 636.1405145525932], "clusters_y":[387.520780980587, 378.15718215703964], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["with", "with", "with", "with", "with", "with", "with"], ["a", "a", "a", "a", "a", "a", "a"], ["deeper", "deeper", "deeper", "deeper", "deeper", "deeper", "deeper"], ["[unclear]pomerteroners[/unclear]", "[unclear]pomerteroners[/unclear]", "powerlessness", "powerlessness", "powerlessness", "powerlessness", "powerlessness"], ["than", "than", "than", "than", "than", "than", "than"], ["Michael", "Michael", "Michael", "Michael", "Michael", "Michael", "Michael"]], "extract_index":[10, 11, 11, 4, 4, 8, 2], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"with a deeper powerlessness than Michael", "consensus_score":6.666666666666667},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[43.30766403675079, 633.7996148467064], "clusters_y":[418.53771126270294, 411.5150121450424], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Angelo's", "Angelo's", "Angelo's", "Angelo's"], ["Italy", "Italy", "Italy", "Italy"], ["-", "-", "-", "-"], ["unable", "unable", "unable", "unable"], ["to", "to", "to", "to"], ["[unclear][/unclear]", "[unclear]rouse[/unclear]", "raise", "rouse"], ["her", "her", "her", "her"], ["will", "will", "will", "will"], ["&", "&", "&", "&"]], "extract_index":[11, 12, 12, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Angelo's Italy - unable to [unclear][/unclear] her will &", "consensus_score":3.6666666666666665},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378], "clusters_x":[48.574688374996185, 629.7030403614044], "clusters_y":[448.9694013595581, 444.2876019477844], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["say", "say", "say", "say"], ["[underline]I", "[underline]I", "[underline]I", "[underline]I"], ["will", "will", "will", "will"], ["[unclear][/unclear].[/underline]", "[unclear][/unclear].[/underline]", "hie.[/underline]", "[unclear][/unclear].[/underline]"], ["The", "The", "The", "The"], ["beauty", "beauty", "beauty", "beauty"], ["of", "of", "of", "of"], ["the", "the", "the", "the"], ["lines", "lines", "lines", "lines"], ["&", "&", "&", "&"]], "extract_index":[12, 13, 13, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"say [underline]I will [unclear][/unclear].[/underline] The beauty of the lines &", "consensus_score":3.9},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[54.42693763971329, 640.2370890378952], "clusters_y":[473.54884219169617, 475.3045169711113], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["their", "their", "their", "their"], ["inherent", "inherent", "inherent", "inherent"], ["helplessness", "helplessness", "helplessness", "helplessness"], ["haunt", "haunt", "haunt", "haunt"], ["one.", "one.", "one.", "one."]], "extract_index":[13, 14, 14, 6], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"their inherent helplessness haunt one.", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[47.989463448524475, 636.7257394790649], "clusters_y":[513.3441494703293, 506.32145035266876], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Then", "Then", "Then", "Then"], ["we", "we", "we", "we"], ["wander", "wander", "wander", "wander"], ["in", "in", "in", "in"], ["the", "the", "the", "the"], ["Louvre", "Louvre", "Louvre", "Louvre-"], ["-", "-", "-", ""], ["desperate,", "desperate,", "desperate,", "desperate"]], "extract_index":[14, 15, 15, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Then we wander in the Louvre - desperate,", "consensus_score":3.625},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[45.063338816165924, 650.771137714386], "clusters_y":[544.9462894201279, 537.9235903024673], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["ignorant", "ignorant", "ignorant", "ignorant"], ["wanderings.", "wanderings.", "wanderings.", "wanderings."], ["We", "We", "We", "We"], ["had", "had", "had", "had"], ["been", "been", "been", "been"], ["decoyed", "decoyed", "decoyed", "decoyed"]], "extract_index":[15, 16, 16, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"ignorant wanderings. We had been decoyed", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[50.330363154411316, 630.8734902143478], "clusters_y":[575.377979516983, 568.9405053257942], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["to", "to", "to", "to"], ["Paris", "Paris", "Paris", "Paris"], ["by", "by", "by", "by"], ["Mary's", "Mary's", "Mary's", "Mary's"], ["promises", "promises", "promises", "promises"], ["of", "of", "of", "of"], ["Morellian", "Morellian", "Morellian", "Morellian"]], "extract_index":[16, 17, 17, 8], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"to Paris by Mary's promises of Morellian", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[49.745138227939606, 605.7088183760643], "clusters_y":[605.2244599461555, 601.1278854608536], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["teaching", "teaching", "teaching", "teaching"], ["from", "from", "from", "from"], ["Bernhard,", "Bernhard,", "Bernhard,", "Bernhard,"], ["&", "&", "&", "&"], ["we", "we", "we", "we"], ["are", "are", "are", "are"], ["left", "left", "left", "left"]], "extract_index":[17, 18, 18, 9], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"teaching from Bernhard, & we are left", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[52.67126286029816, 645.5041133761406], "clusters_y":[635.6561500430107, 630.3891257047653], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["with", "with", "with", "with"], ["no", "no", "no", "no"], ["shepherd", "shepherd", "shepherd", "shepherd"], ["among", "among", "among", "among"], ["the", "the", "the", "the"], ["tangles", "tangles", "tangles", "tangles"], ["of", "of", "of", "of"]], "extract_index":[18, 19, 19, 10], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"with no shepherd among the tangles of", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[52.08603793382645, 643.7484385967255], "clusters_y":[668.4287551045418, 664.3321806192398], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["attribution", "attribution", "attribution", "attribution", "attribution", "attribution", "attribution"], ["[unclear][/unclear][unclear][/unclear]", "[unclear][/unclear][unclear][/unclear]", "&c.", "&c.", "[unclear][/unclear].", "[unclear][/unclear].", "&c."], ["", "", "Aesthetically", "Aesthetically", "[unclear]Aesthetically[/unclear]", "[unclear]Aesthetically[/unclear]", "Aesthetically"], ["we", "we", "we", "we", "we", "we", "we"], ["know", "know", "know", "know", "know", "know", "know"], ["the", "the", "the", "the", "the", "the", "the"]], "extract_index":[19, 20, 20, 6, 6, 7, 3], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"attribution &c. Aesthetically we know the", "consensus_score":5.666666666666667},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[57.93828719854355, 641.9927638173103], "clusters_y":[697.1047704219818, 692.4229710102081], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Louvre", "Louvre", "Louvre", "Louvre"], ["very", "very", "very", "very"], ["well.", "well.", "well.", "well."], ["Historically,", "Historically,", "Historically,", "Historically,"], ["critically", "critically", "critically", "critically"]], "extract_index":[20, 21, 21, 11], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Louvre very well. Historically, critically", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[60.27918690443039, 644.3336635231972], "clusters_y":[725.7807698845863, 719.3432956933975], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["", "", "not", ""], ["what", "what", "at", "what"], ["all.", "all.", "all.", "all."], ["I", "I", "I", "I"], ["am", "am", "am", "am"], ["as", "as", "as", "as"], ["deep", "deep", "deep", "deep"], ["in", "in", "in", "in"], ["despair", "despair", "despair", "despair"], ["as", "as", "as", "as"]], "extract_index":[21, 22, 22, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"not what all. I am as deep in despair as", "consensus_score":3.6},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[53.84171271324158, 622.0951163172722], "clusters_y":[758.5533749461174, 752.1159007549286], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["d'Esclare", "d'Esclare", "d'Esclare", "d'Esclare"], ["himself,", "himself,", "himself,", "himself,"], ["numb", "numb", "numb", "numb"], ["beneath", "beneath", "beneath", "beneath"], ["my", "my", "my", "my"]], "extract_index":[22, 23, 23, 12], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"d'Esclare himself, numb beneath my", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[53.84171271324158, 667.7426605820656], "clusters_y":[789.5702747106552, 793.0816242694855], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["ignorance.", "ignorance.", "ignorance.", "ignorance."], ["When", "When", "When", "When"], ["we", "we", "we", "we"], ["return", "return", "return", "return"], ["in", "in", "in", "in"], ["the", "the", "the", "the"], ["afternoon", "afternoon", "afternoon", "afternoon"]], "extract_index":[23, 24, 24, 13], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"ignorance. When we return in the afternoon", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378], "clusters_x":[55.59738749265671, 620.339441537857], "clusters_y":[822.3428590297699, 813.5644851326942], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["is", "is", "is", "is"], ["[unclear]away[/unclear];", "[unclear]away[/unclear];", "away;", "away;"], ["I", "I", "I", "I"], ["retire", "retire", "retire", "retire"], ["to", "to", "to", "to"], ["my", "my", "my", "my"], ["bed", "bed", "bed", "bed"], ["&", "&", "&", "&"]], "extract_index":[24, 25, 25, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Bernhard is [unclear]away[/unclear]; I retire to my bed &", "consensus_score":3.7777777777777777},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[59.69396197795868, 629.1178154349327], "clusters_y":[854.5302696824074, 851.0189201235771], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["from", "from", "from", "from", "from", "from"], ["my", "my", "my", "my", "my", "my"], ["", "", "resting", "resting", "resting", "resting"], ["[unclear][/unclear]", "[unclear][/unclear]", "", "", "[deletion]ty[/deletion]", ""], ["place", "place", "place", "place", "place", "place"], ["I", "I", "I", "I", "I", "I"], ["hear", "hear", "hear", "hear", "hear", "hear"], ["Sim", "Sim", "Sim's", "Sim's", "Sim's", "Sim's"], ["'O", "'O", "", "", "", ""]], "extract_index":[25, 26, 26, 8, 0, 4], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"from my resting [unclear][/unclear] place I hear Sim's 'O", "consensus_score":4.666666666666667},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[64.96098631620407, 652.5268124938011], "clusters_y":[884.9619445204735, 876.1835706233978], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["[unclear][/unclear]", "[unclear][/unclear]", "frank", "frank", "frank", "frank"], ["voice", "voice", "voice", "voice", "voice", "voice"], ["questioning", "questioning", "questioning", "questioning", "questioning", "questioning"], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["as", "as", "as", "as", "as", "as"], ["to", "to", "to", "to", "to", "to"], ["the", "the", "the", "the", "the", "the"]], "extract_index":[26, 27, 27, 9, 1, 5], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"frank voice questioning Mary as to the", "consensus_score":5.714285714285714},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[62.62008661031723, 642.577988743782], "clusters_y":[921.2458686232567, 911.8822697997093], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["reason", "reason", "reason", "reason"], ["why", "why", "why", "why"], ["Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["makes", "makes", "makes", "makes"], ["no", "no", "no", "no"], ["time", "time", "time", "time"], ["to", "to", "to", "to"]], "extract_index":[27, 28, 28, 14], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"reason why Bernhard makes no time to", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[60.27918690443039, 651.3563626408577], "clusters_y":[951.6775739789009, 940.5583003759384], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["help", "help", "help", "help", "help", "help"], ["us", "us", "us", "us", "us", "us"], ["&", "&", "&", "&", "&", "&"], ["offering", "offering", "offering", "offering", "offering", "offering"], ["[unclear]Manly[/unclear]!", "[unclear]Manly[/unclear]!", "Money!", "[underline]Money[/underline]!", "[underline]money[/underline]!", "[unclear][/unclear]"], ["if", "if", "if", "if", "if", "if"], ["he", "he", "he", "he", "he", "he"], ["will", "will", "will", "will", "will", "will"], ["[deletion][/deletion]", "[deletion][/deletion]", "[deletion]help[/deletion]", "[deletion]help[/deletion]", "[deletion]help[/deletion]", ""]], "extract_index":[28, 29, 29, 10, 2, 6], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"help us & offering [unclear]Manly[/unclear]! if he will [deletion]help[/deletion]", "consensus_score":5.222222222222222},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031, 2446094], "clusters_x":[58.52351212501526, 632.0439400672913], "clusters_y":[973.9161303639412, 968.6491060256958], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["[deletion][/deletion]", "[deletion][/deletion]", "[deletion]us[/deletion]", "[deletion]us[/deletion]", "[deletion]us[/deletion]", "", "[deletion][/deletion]"], ["give", "give", "give", "give", "give", "give", "give"], ["us", "us", "us", "us", "us", "us", "us"], ["instruction.", "instruction.", "instruction.", "instruction.", "instruction.", "instruction.", "instruction."], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["has", "has", "has", "has", "has", "has", "has"], ["not", "not", "not", "not", "not", "not", "not"]], "extract_index":[29, 30, 30, 11, 3, 7, 1], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"[deletion][/deletion] give us instruction. Mary has not", "consensus_score":6.428571428571429},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[59.10873705148697, 636.7257394790649], "clusters_y":[1010.2000535726547, 1004.3478043079376], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["much", "much", "much", "much"], ["to", "to", "to", "to"], ["say", "say", "say", "say"], ["for", "for", "for", "for"], ["him,", "him,", "him,", "him,"], ["but", "but", "but", "but"], ["grasps", "grasps", "grasps", "grasps"], ["at", "at", "at", "at"], ["the", "the", "the", "the"]], "extract_index":[30, 31, 31, 15], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"much to say for him, but grasps at the", "consensus_score":4.0},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[54.42693763971329, 654.2824872732162], "clusters_y":[1040.6317482590675, 1042.3874230384827], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["idea", "idea", "idea", "idea", "idea", "idea"], ["of", "of", "of", "of", "of", "of"], ["his", "his", "his", "his", "his", "his"], ["earning", "earning", "earning", "earning", "earning", "earning"], ["some", "some", "some", "some", "some", "some"], ["coins.", "coins.", "coins.", "coins.", "coins.", "coins."], ["", "", "[deletion]The", "[deletion]The", "[deletion]The", ""], ["[deletion][/deletion]", "[deletion][/deletion]", "young[/deletion]", "young[/deletion]", "young[/deletion]", "[deletion][/deletion]"]], "extract_index":[31, 32, 32, 12, 4, 8], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"idea of his earning some coins. [deletion]The [deletion][/deletion]", "consensus_score":5.25},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[56.18261241912842, 659.5495116114616], "clusters_y":[1068.1373183131218, 1072.2338927984238], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["through", "through", "through", "through", "through", "through"], ["the", "the", "the", "the", "the", "the"], ["[unclear]closing[/unclear]", "[unclear]closing[/unclear]", "ceasing", "ceasing", "ceasing", "ceasing"], ["of", "of", "of", "of", "of", "of"], ["a", "a", "a", "a", "a", "a"], ["bounty", "bounty", "bounty", "bounty", "bounty", "bounty"], ["from", "from", "from", "from", "from", "from"], ["a", "a", "a", "a", "a", "a"], ["[unclear][/unclear]", "[unclear][/unclear]", "private", "private", "private", "pirate"]], "extract_index":[32, 33, 33, 13, 5, 9], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"through the ceasing of a bounty from a private", "consensus_score":5.444444444444445},
 {"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031, 2446094, null, 1312868, 1590807], "clusters_x":[55.59738749265671, 674.1801347732544], "clusters_y":[1104.4213034510612, 1101.4951788187027], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":10, "clusters_text":[["source", "source", "source", "source", "source", "source", "source", "source", "source", "source"], ["he", "he", "he", "he", "he", "he", "he", "he", "he", "he"], ["is", "is", "is", "is", "is", "is", "is", "is", "is", "is"], ["poor", "poor", "poor", "poor", "poor", "poor", "poor,", "poor,", "poor,", "poor"], ["&", "&", "&", "&", "&", "+", "and", "&", "and", "&"], ["is", "is", "is", "is", "is", "is", "is", "is", "is", "is"], ["seized,", "seized,", "seized,", "seized,", "seized,", "seized", "seized,", "seized,", "seized,", "seized,"], ["like", "like", "like", "like", "like", "like", "like", "like", "like", "like"], ["[unclear][/unclear]", "[unclear][/unclear]", "Midas", "Midas", "Midas", "[unclear][/unclear]", "[unclear][/unclear],", "Midas", "[unclear]Nudas[/unclear],", "Midas,"], ["with", "with", "with", "with", "with", "with", "with", "with", "with", "with"]], "extract_index":[33, 34, 34, 14, 6, 10, 0, 0, 0, 0], "gold_standard":[false, false, false, false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"source he is poor & is seized, like Midas with", "consensus_score":8.7},
 {"flagged":false, "user_ids":[2408244, 1959934, 2114378, 1914031], "clusters_x":[543.9507215572211, 640.6058283924376], "clusters_y":[31.144423313569746, 37.588097102584186], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":1, "number_views":4, "clusters_text":[["126", "126", "126", "126"], ["[deletion]209[/deletion]", "[deletion]209[/deletion]", "[deletion]207[/deletion]", "[deletion]209[/deletion]"]], "extract_index":[0, 1, 1, 11], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"126 [deletion]209[/deletion]", "consensus_score":3.5}]
eatyourgreens commented 2 years ago

The code that loads and processes reductions is here. I’m fairly sure it was taken from ASM, and generates a pink or grey line for each reduction. https://github.com/zooniverse/front-end-monorepo/blob/master/packages/lib-classifier/src/store/SubjectStore/Subject/TranscriptionReductions/TranscriptionReductions.js

The subjects that had problems had multiple reductions with the same coordinates and consensus text, but different usernames.

CKrawczyk commented 2 years ago

🤦 I think I know what the issue is, for the OPTICS reducer the "distance" between classifications is:

This distance is found by summing the euclidean distance between the start points of each line, the Euclidean distance between the end points of each line, and the Levenshtein distance of the text for each line. The Levenshtein distance is done after stripping text tags and consolidating whitespace.

https://aggregation-caesar.zooniverse.org/reducers.html#panoptes_aggregation.reducers.optics_text_utils.metric

So if the typed text is significantly different a new cluster will be formed at the same position. This is better for the "consensus text" calculation but not better for displaying in the UI...

If it is happening a lot on a project they can try adjusting min_samples from "auto" to 3 and see if that helps.

For reference, subject 76124479 shows this issue of finding two clusters at the same position, each with different text.

CKrawczyk commented 2 years ago

I can see three ways to move forward:

  1. Change min_samples for the project and see if it helps (also might not help in this case, would need to do tests to find out)
  2. Adjust the distance metric code to only account for the Levenshtein distance if passed in as a flag on the reducer
  3. Adjust the front-end code to look for clusters at the same position and merge them

Not sure what others' opinions are, is this better handled in the reducer code with a flag or the way the front-end displays the results?

goplayoutside3 commented 9 months ago

@snblickhan this discussion about transcribed lines was never concluded. Just wanted to see if this is on your radar - has there been any reports in the last year about complete and incomplete lines displaying on top of each other?

snblickhan commented 9 months ago

@goplayoutside3 I can't find the link ATM, but this was definitely resolved! Very likely a duplicate issue or one that wasn't closed after the actual problem was ID'd & fixed.

goplayoutside3 commented 9 months ago

Thanks!

eatyourgreens commented 2 months ago

Here's an example of this bug from Maria Edgeworth Letters. Lines repeat two or three times, with identical text, in the Caesar reductions.

https://eatyourgreens.github.io/transcription-explorer/mariaedgeworthletters/maria-edgeworth-letters/subjects/87415864/

goplayoutside3 commented 2 months ago

@eatyourgreens is your comment intended to re-open this Issue? Maria Edgeworth Letters is not an active project at the moment as it's out of data. Was the bug reported by that project team?

eatyourgreens commented 2 months ago

No, it's just another example of this bug in the Caesar reductions. Quite a good one, as that subject is mostly duplicated lines on the first page. This issue originally lacked links to examples, which made it hard to diagnose, so I've added that as an example.

For posterity, here's that subject in the classifier, showing this particular issue. Lines like "Edgeworth's Town" are rendered as three lines, on top of each other and each with only one choice in the drop-down menu, rather than one line with a drop-down menu containing all three choices. https://www.zooniverse.org/projects/mariaedgeworthletters/maria-edgeworth-letters/classify/workflow/18542/subject/87415864

The bug's easier to see in the DOM inspector, where you can see that lines transcribed-0, transcribed-1, and transcribed-2 are identical.

Screenshot with the DOM inspector open to show that lines transcribed-0, transcribed-1, and transcribed-2 are identical.

I think the bug was fixed in Caesar, but I’m not sure. I don't think duplicate lines show up in the Caesar reductions for newer projects. The subject viewer code doesn't detect and merge duplicates, just displays each consensus reduction from Caesar as a previously-transcribed line.