Closed eatyourgreens closed 9 months ago
Re-opening this because I can create multiple new lines for each previously transcribed line on Davy Notebooks: https://www.zooniverse.org/projects/humphrydavy/davy-notebooks-project/classify/workflow/21734
Opening this again, because it still affects Poets & Lovers.
The latest version of this bug is happening because there can be more than one pink line for each previous transcription in the response from Caesar eg. here there are three copies of each line, rather than one line with three text options to choose from:
This seems to be a bug in Caesar (or Tove?) The response from Caesar can continue multiple lines with the same coordinates and consensus text. The transcription task simply takes each line from the Caesar response and renders it on the page, without checking if there's an existing line with the same coordinates and text.
Pinging @CKrawczyk because this seems to be something odd in the Caesar response: reductions with the same coordinates and consensus text but different user IDs. The frontend code is expecting those to be clustered into single lines with multiple user IDs.
@eatyourgreens can you provide the workflow and subject ID for an example? I will take a closer look at what the reducer is doing, if the line endpoints are the same it should be combining them into a single reduced line.
@CKrawczyk I just found this on Poets & Lovers: a grey line (completed line) and purple line (incomplete line) on top of each other. The grey line appears after I've edited the pink line, which disables it and removes the event handlers.
The subject ID is 76124471, from the browser console.
I was able to reproduce this on almost any page from Poets & Lovers, by the way, so it should be present on almost any subject linked to that set or workflow.
I grabbed the reduction for this subject from caesar using https://caesar.zooniverse.org/workflows/21362/subjects/76124471 and there are no overlaps. This matches the graphiQL query for the subject:
{
workflow(id: 21362) {
reductions(subjectId: 76124471, reducerKey: "alice") {
data
}
}
}
Is there any kind of postprocessing that is done on the reduction data before it is displayed? Are the extracts used as well to pull individual lines? It could be that the two are being mangled during that matching process leading to the bug.
If it helps with debugging here is the JSON for that subjects reductions in and easier to copy/paste formate:
[{"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[35.11451506614685, 661.8904113173485], "clusters_y":[80.277707695961, 73.25500857830048], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["of", "of", "of", "of"], ["the", "the", "the", "the"], ["figure", "figure", "figure", "figure"], ["snit", "snit", "suit", "snit"], ["our", "our", "our", "our"], ["mood.", "mood.", "mood.", "mood."], ["We", "We", "We", "We"], ["grew", "grew", "grow", "grew"], ["very", "very", "very", "very"], ["sad,", "sad,", "sad,", "sad,"], ["very", "very", "very", "very"]], "extract_index":[0, 1, 0, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"of the figure snit our mood. We grew very sad, very", "consensus_score":3.8181818181818183},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[42.13721418380737, 651.3563626408577], "clusters_y":[105.4423810839653, 102.51625645160675], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["forsaken", "forsaken", "forsaken", "forsaken"], ["among", "among", "among", "among"], ["the", "the", "the", "the"], ["lovely", "lovely", "lovely", "lovely"], ["Waltean", "Waltean", "Waltean", "Waltean"], ["drawings", "drawings", "drawings", "drawings"], ["-", "-", "-", "-"]], "extract_index":[1, 2, 2, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"forsaken among the lovely Waltean drawings -", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[43.30766403675079, 646.674563229084], "clusters_y":[137.04452866315842, 128.26615476608276], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["their", "their", "their", "their"], ["fragile", "fragile", "fragile", "fragile"], ["mirth", "mirth", "mirth", "mirth"], ["makes", "makes", "makes", "makes"], ["us", "us", "us", "us"], ["sick", "sick", "sick", "sick"], ["at", "at", "at", "at"], ["heart", "heart", "heart", "heart"]], "extract_index":[2, 3, 3, 1], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"their fragile mirth makes us sick at heart", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[36.87018984556198, 644.9188884496689], "clusters_y":[165.13531905412674, 157.5273950099945], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["We", "We", "We", "We"], ["face", "face", "face", "face"], ["the", "the", "the", "the"], ["sun", "sun", "sun", "sun"], ["again,", "again,", "again,", "again,"], ["take", "take", "take", "take"], ["a", "a", "a", "a"], ["carriage", "carriage", "carriage", "carriage"], ["&", "&", "&", "&"], ["drive", "drive", "drive", "drive"]], "extract_index":[3, 4, 4, 2], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"We face the sun again, take a carriage & drive", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[45.063338816165924, 656.0381620526314], "clusters_y":[200.24881619215012, 205.5158405303955], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["to", "to", "to", "to", "to", "to", "to"], ["avoid", "avoid", "avoid", "avoid", "avoid", "avoid", "avoid"], ["[unclear][/unclear].", "[unclear][/unclear].", "sunstroke.", "sunstroke.", "sunstroke.", "sunstroke.", "sunstroke."], ["Only", "Only", "Only", "Only", "Only", "Only", "Only"], ["Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["is", "is", "is", "is", "is", "is", "is"], ["in", "in", "in", "in", "in", "in", "in"], ["the", "the", "the", "the", "the", "the", "the"], ["Salon", "Salon", "Salon", "Salon", "Salon", "Salon", "Salon"]], "extract_index":[4, 5, 5, 1, 0, 11, 0], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"to avoid sunstroke. Only Bernhard is in the Salon", "consensus_score":6.777777777777778},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[47.989463448524475, 654.2824872732162], "clusters_y":[229.5100640654564, 231.26573884487152], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["reading", "reading", "reading", "reading"], ["profoundly;", "profoundly;", "profoundly;", "profoundly;"], ["Mary", "Mary", "Mary", "Mary"], ["is", "is", "is", "is"], ["lying", "lying", "lying", "lying"], ["down", "down", "down", "down"]], "extract_index":[5, 6, 6, 3], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"reading profoundly; Mary is lying down", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[49.159913301467896, 458.23213690519333], "clusters_y":[259.3565368652344, 265.2087861299515], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["in", "in", "in", "in", "in", "in", "in"], ["her", "her", "her", "her", "her", "her", "her"], ["room", "room", "room", "room", "room", "room", "room"], ["-", "-", "-", "-", "-", "-", "-"], ["[unclear][/unclear]", "[unclear][/unclear]", "sans", "same", "sans", "sans", "sans"], ["anything!", "anything!", "anything!", "anything!", "anything!", "anything!", "anything!"]], "extract_index":[6, 7, 7, 2, 2, 10, 1], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"in her room - sans anything!", "consensus_score":6.5},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[35.69973999261856, 292.02825778722763], "clusters_y":[292.7143592238426, 291.5439093708992], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["[underline]Wednesday", "[underline]Wednesday", "[underline]Wednesday", "[underline]Wednesday"], ["29", "29", "29", "29"], ["June", "June", "June", "June"], ["29.[/underline]", "29.[/underline]", "29.[/underline]", "29.[/underline]"]], "extract_index":[7, 8, 8, 4], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"[underline]Wednesday 29 June 29.[/underline]", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989], "clusters_x":[43.30766403675079, 653.6972623467445], "clusters_y":[321.97559946775436, 317.2938000559807], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["Again", "Again", "Again", "Again", "Again", "Again"], ["the", "the", "we", "we", "we", "the"], ["story", "story", "stay", "stay", "stay", "story"], ["", "", "with", "with", "with", ""], ["d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare.", "d'Esclare."], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["says", "says", "says", "says", "says", "says"], ["he", "he", "he", "he", "he", "he"]], "extract_index":[8, 9, 9, 3, 3, 9], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"Again the story with d'Esclare. Mary says he", "consensus_score":4.875},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[43.30766403675079, 648.4302380084991], "clusters_y":[354.7481968998909, 350.0663974881172], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["is", "is", "is", "is"], ["to", "to", "to", "to"], ["her", "her", "her", "her"], ["the", "the", "the", "the"], ["type", "type", "type", "type"], ["of", "of", "of", "of"], ["Womanhood,", "Womanhood,", "Womanhood,", "Womanhood,"], ["whelmed", "whelmed", "whelmed", "whelmed"]], "extract_index":[9, 10, 10, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"is to her the type of Womanhood, whelmed", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[42.72243911027908, 636.1405145525932], "clusters_y":[387.520780980587, 378.15718215703964], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["with", "with", "with", "with", "with", "with", "with"], ["a", "a", "a", "a", "a", "a", "a"], ["deeper", "deeper", "deeper", "deeper", "deeper", "deeper", "deeper"], ["[unclear]pomerteroners[/unclear]", "[unclear]pomerteroners[/unclear]", "powerlessness", "powerlessness", "powerlessness", "powerlessness", "powerlessness"], ["than", "than", "than", "than", "than", "than", "than"], ["Michael", "Michael", "Michael", "Michael", "Michael", "Michael", "Michael"]], "extract_index":[10, 11, 11, 4, 4, 8, 2], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"with a deeper powerlessness than Michael", "consensus_score":6.666666666666667},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[43.30766403675079, 633.7996148467064], "clusters_y":[418.53771126270294, 411.5150121450424], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Angelo's", "Angelo's", "Angelo's", "Angelo's"], ["Italy", "Italy", "Italy", "Italy"], ["-", "-", "-", "-"], ["unable", "unable", "unable", "unable"], ["to", "to", "to", "to"], ["[unclear][/unclear]", "[unclear]rouse[/unclear]", "raise", "rouse"], ["her", "her", "her", "her"], ["will", "will", "will", "will"], ["&", "&", "&", "&"]], "extract_index":[11, 12, 12, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Angelo's Italy - unable to [unclear][/unclear] her will &", "consensus_score":3.6666666666666665},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378], "clusters_x":[48.574688374996185, 629.7030403614044], "clusters_y":[448.9694013595581, 444.2876019477844], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["say", "say", "say", "say"], ["[underline]I", "[underline]I", "[underline]I", "[underline]I"], ["will", "will", "will", "will"], ["[unclear][/unclear].[/underline]", "[unclear][/unclear].[/underline]", "hie.[/underline]", "[unclear][/unclear].[/underline]"], ["The", "The", "The", "The"], ["beauty", "beauty", "beauty", "beauty"], ["of", "of", "of", "of"], ["the", "the", "the", "the"], ["lines", "lines", "lines", "lines"], ["&", "&", "&", "&"]], "extract_index":[12, 13, 13, 5], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"say [underline]I will [unclear][/unclear].[/underline] The beauty of the lines &", "consensus_score":3.9},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[54.42693763971329, 640.2370890378952], "clusters_y":[473.54884219169617, 475.3045169711113], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["their", "their", "their", "their"], ["inherent", "inherent", "inherent", "inherent"], ["helplessness", "helplessness", "helplessness", "helplessness"], ["haunt", "haunt", "haunt", "haunt"], ["one.", "one.", "one.", "one."]], "extract_index":[13, 14, 14, 6], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"their inherent helplessness haunt one.", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[47.989463448524475, 636.7257394790649], "clusters_y":[513.3441494703293, 506.32145035266876], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Then", "Then", "Then", "Then"], ["we", "we", "we", "we"], ["wander", "wander", "wander", "wander"], ["in", "in", "in", "in"], ["the", "the", "the", "the"], ["Louvre", "Louvre", "Louvre", "Louvre-"], ["-", "-", "-", ""], ["desperate,", "desperate,", "desperate,", "desperate"]], "extract_index":[14, 15, 15, 0], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Then we wander in the Louvre - desperate,", "consensus_score":3.625},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[45.063338816165924, 650.771137714386], "clusters_y":[544.9462894201279, 537.9235903024673], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["ignorant", "ignorant", "ignorant", "ignorant"], ["wanderings.", "wanderings.", "wanderings.", "wanderings."], ["We", "We", "We", "We"], ["had", "had", "had", "had"], ["been", "been", "been", "been"], ["decoyed", "decoyed", "decoyed", "decoyed"]], "extract_index":[15, 16, 16, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"ignorant wanderings. We had been decoyed", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[50.330363154411316, 630.8734902143478], "clusters_y":[575.377979516983, 568.9405053257942], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["to", "to", "to", "to"], ["Paris", "Paris", "Paris", "Paris"], ["by", "by", "by", "by"], ["Mary's", "Mary's", "Mary's", "Mary's"], ["promises", "promises", "promises", "promises"], ["of", "of", "of", "of"], ["Morellian", "Morellian", "Morellian", "Morellian"]], "extract_index":[16, 17, 17, 8], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"to Paris by Mary's promises of Morellian", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[49.745138227939606, 605.7088183760643], "clusters_y":[605.2244599461555, 601.1278854608536], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["teaching", "teaching", "teaching", "teaching"], ["from", "from", "from", "from"], ["Bernhard,", "Bernhard,", "Bernhard,", "Bernhard,"], ["&", "&", "&", "&"], ["we", "we", "we", "we"], ["are", "are", "are", "are"], ["left", "left", "left", "left"]], "extract_index":[17, 18, 18, 9], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"teaching from Bernhard, & we are left", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[52.67126286029816, 645.5041133761406], "clusters_y":[635.6561500430107, 630.3891257047653], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["with", "with", "with", "with"], ["no", "no", "no", "no"], ["shepherd", "shepherd", "shepherd", "shepherd"], ["among", "among", "among", "among"], ["the", "the", "the", "the"], ["tangles", "tangles", "tangles", "tangles"], ["of", "of", "of", "of"]], "extract_index":[18, 19, 19, 10], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"with no shepherd among the tangles of", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null, 2114378, 1742989, 1914031], "clusters_x":[52.08603793382645, 643.7484385967255], "clusters_y":[668.4287551045418, 664.3321806192398], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["attribution", "attribution", "attribution", "attribution", "attribution", "attribution", "attribution"], ["[unclear][/unclear][unclear][/unclear]", "[unclear][/unclear][unclear][/unclear]", "&c.", "&c.", "[unclear][/unclear].", "[unclear][/unclear].", "&c."], ["", "", "Aesthetically", "Aesthetically", "[unclear]Aesthetically[/unclear]", "[unclear]Aesthetically[/unclear]", "Aesthetically"], ["we", "we", "we", "we", "we", "we", "we"], ["know", "know", "know", "know", "know", "know", "know"], ["the", "the", "the", "the", "the", "the", "the"]], "extract_index":[19, 20, 20, 6, 6, 7, 3], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"attribution &c. Aesthetically we know the", "consensus_score":5.666666666666667},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[57.93828719854355, 641.9927638173103], "clusters_y":[697.1047704219818, 692.4229710102081], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Louvre", "Louvre", "Louvre", "Louvre"], ["very", "very", "very", "very"], ["well.", "well.", "well.", "well."], ["Historically,", "Historically,", "Historically,", "Historically,"], ["critically", "critically", "critically", "critically"]], "extract_index":[20, 21, 21, 11], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Louvre very well. Historically, critically", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, null], "clusters_x":[60.27918690443039, 644.3336635231972], "clusters_y":[725.7807698845863, 719.3432956933975], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["", "", "not", ""], ["what", "what", "at", "what"], ["all.", "all.", "all.", "all."], ["I", "I", "I", "I"], ["am", "am", "am", "am"], ["as", "as", "as", "as"], ["deep", "deep", "deep", "deep"], ["in", "in", "in", "in"], ["despair", "despair", "despair", "despair"], ["as", "as", "as", "as"]], "extract_index":[21, 22, 22, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"not what all. I am as deep in despair as", "consensus_score":3.6},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[53.84171271324158, 622.0951163172722], "clusters_y":[758.5533749461174, 752.1159007549286], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["d'Esclare", "d'Esclare", "d'Esclare", "d'Esclare"], ["himself,", "himself,", "himself,", "himself,"], ["numb", "numb", "numb", "numb"], ["beneath", "beneath", "beneath", "beneath"], ["my", "my", "my", "my"]], "extract_index":[22, 23, 23, 12], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"d'Esclare himself, numb beneath my", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[53.84171271324158, 667.7426605820656], "clusters_y":[789.5702747106552, 793.0816242694855], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["ignorance.", "ignorance.", "ignorance.", "ignorance."], ["When", "When", "When", "When"], ["we", "we", "we", "we"], ["return", "return", "return", "return"], ["in", "in", "in", "in"], ["the", "the", "the", "the"], ["afternoon", "afternoon", "afternoon", "afternoon"]], "extract_index":[23, 24, 24, 13], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"ignorance. When we return in the afternoon", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378], "clusters_x":[55.59738749265671, 620.339441537857], "clusters_y":[822.3428590297699, 813.5644851326942], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["is", "is", "is", "is"], ["[unclear]away[/unclear];", "[unclear]away[/unclear];", "away;", "away;"], ["I", "I", "I", "I"], ["retire", "retire", "retire", "retire"], ["to", "to", "to", "to"], ["my", "my", "my", "my"], ["bed", "bed", "bed", "bed"], ["&", "&", "&", "&"]], "extract_index":[24, 25, 25, 7], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"Bernhard is [unclear]away[/unclear]; I retire to my bed &", "consensus_score":3.7777777777777777},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[59.69396197795868, 629.1178154349327], "clusters_y":[854.5302696824074, 851.0189201235771], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["from", "from", "from", "from", "from", "from"], ["my", "my", "my", "my", "my", "my"], ["", "", "resting", "resting", "resting", "resting"], ["[unclear][/unclear]", "[unclear][/unclear]", "", "", "[deletion]ty[/deletion]", ""], ["place", "place", "place", "place", "place", "place"], ["I", "I", "I", "I", "I", "I"], ["hear", "hear", "hear", "hear", "hear", "hear"], ["Sim", "Sim", "Sim's", "Sim's", "Sim's", "Sim's"], ["'O", "'O", "", "", "", ""]], "extract_index":[25, 26, 26, 8, 0, 4], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"from my resting [unclear][/unclear] place I hear Sim's 'O", "consensus_score":4.666666666666667},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[64.96098631620407, 652.5268124938011], "clusters_y":[884.9619445204735, 876.1835706233978], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["[unclear][/unclear]", "[unclear][/unclear]", "frank", "frank", "frank", "frank"], ["voice", "voice", "voice", "voice", "voice", "voice"], ["questioning", "questioning", "questioning", "questioning", "questioning", "questioning"], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["as", "as", "as", "as", "as", "as"], ["to", "to", "to", "to", "to", "to"], ["the", "the", "the", "the", "the", "the"]], "extract_index":[26, 27, 27, 9, 1, 5], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"frank voice questioning Mary as to the", "consensus_score":5.714285714285714},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[62.62008661031723, 642.577988743782], "clusters_y":[921.2458686232567, 911.8822697997093], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["reason", "reason", "reason", "reason"], ["why", "why", "why", "why"], ["Bernhard", "Bernhard", "Bernhard", "Bernhard"], ["makes", "makes", "makes", "makes"], ["no", "no", "no", "no"], ["time", "time", "time", "time"], ["to", "to", "to", "to"]], "extract_index":[27, 28, 28, 14], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"reason why Bernhard makes no time to", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[60.27918690443039, 651.3563626408577], "clusters_y":[951.6775739789009, 940.5583003759384], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["help", "help", "help", "help", "help", "help"], ["us", "us", "us", "us", "us", "us"], ["&", "&", "&", "&", "&", "&"], ["offering", "offering", "offering", "offering", "offering", "offering"], ["[unclear]Manly[/unclear]!", "[unclear]Manly[/unclear]!", "Money!", "[underline]Money[/underline]!", "[underline]money[/underline]!", "[unclear][/unclear]"], ["if", "if", "if", "if", "if", "if"], ["he", "he", "he", "he", "he", "he"], ["will", "will", "will", "will", "will", "will"], ["[deletion][/deletion]", "[deletion][/deletion]", "[deletion]help[/deletion]", "[deletion]help[/deletion]", "[deletion]help[/deletion]", ""]], "extract_index":[28, 29, 29, 10, 2, 6], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"help us & offering [unclear]Manly[/unclear]! if he will [deletion]help[/deletion]", "consensus_score":5.222222222222222},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031, 2446094], "clusters_x":[58.52351212501526, 632.0439400672913], "clusters_y":[973.9161303639412, 968.6491060256958], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":7, "clusters_text":[["[deletion][/deletion]", "[deletion][/deletion]", "[deletion]us[/deletion]", "[deletion]us[/deletion]", "[deletion]us[/deletion]", "", "[deletion][/deletion]"], ["give", "give", "give", "give", "give", "give", "give"], ["us", "us", "us", "us", "us", "us", "us"], ["instruction.", "instruction.", "instruction.", "instruction.", "instruction.", "instruction.", "instruction."], ["Mary", "Mary", "Mary", "Mary", "Mary", "Mary", "Mary"], ["has", "has", "has", "has", "has", "has", "has"], ["not", "not", "not", "not", "not", "not", "not"]], "extract_index":[29, 30, 30, 11, 3, 7, 1], "gold_standard":[false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"[deletion][/deletion] give us instruction. Mary has not", "consensus_score":6.428571428571429},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 1839711], "clusters_x":[59.10873705148697, 636.7257394790649], "clusters_y":[1010.2000535726547, 1004.3478043079376], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":4, "clusters_text":[["much", "much", "much", "much"], ["to", "to", "to", "to"], ["say", "say", "say", "say"], ["for", "for", "for", "for"], ["him,", "him,", "him,", "him,"], ["but", "but", "but", "but"], ["grasps", "grasps", "grasps", "grasps"], ["at", "at", "at", "at"], ["the", "the", "the", "the"]], "extract_index":[30, 31, 31, 15], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"much to say for him, but grasps at the", "consensus_score":4.0},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[54.42693763971329, 654.2824872732162], "clusters_y":[1040.6317482590675, 1042.3874230384827], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["idea", "idea", "idea", "idea", "idea", "idea"], ["of", "of", "of", "of", "of", "of"], ["his", "his", "his", "his", "his", "his"], ["earning", "earning", "earning", "earning", "earning", "earning"], ["some", "some", "some", "some", "some", "some"], ["coins.", "coins.", "coins.", "coins.", "coins.", "coins."], ["", "", "[deletion]The", "[deletion]The", "[deletion]The", ""], ["[deletion][/deletion]", "[deletion][/deletion]", "young[/deletion]", "young[/deletion]", "young[/deletion]", "[deletion][/deletion]"]], "extract_index":[31, 32, 32, 12, 4, 8], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"idea of his earning some coins. [deletion]The [deletion][/deletion]", "consensus_score":5.25},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031], "clusters_x":[56.18261241912842, 659.5495116114616], "clusters_y":[1068.1373183131218, 1072.2338927984238], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":6, "clusters_text":[["through", "through", "through", "through", "through", "through"], ["the", "the", "the", "the", "the", "the"], ["[unclear]closing[/unclear]", "[unclear]closing[/unclear]", "ceasing", "ceasing", "ceasing", "ceasing"], ["of", "of", "of", "of", "of", "of"], ["a", "a", "a", "a", "a", "a"], ["bounty", "bounty", "bounty", "bounty", "bounty", "bounty"], ["from", "from", "from", "from", "from", "from"], ["a", "a", "a", "a", "a", "a"], ["[unclear][/unclear]", "[unclear][/unclear]", "private", "private", "private", "pirate"]], "extract_index":[32, 33, 33, 13, 5, 9], "gold_standard":[false, false, false, false, false, false], "low_consensus":false, "consensus_text":"through the ceasing of a bounty from a private", "consensus_score":5.444444444444445},
{"flagged":false, "user_ids":[304838, 2408244, 1959934, 2114378, 1742989, 1914031, 2446094, null, 1312868, 1590807], "clusters_x":[55.59738749265671, 674.1801347732544], "clusters_y":[1104.4213034510612, 1101.4951788187027], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":0, "number_views":10, "clusters_text":[["source", "source", "source", "source", "source", "source", "source", "source", "source", "source"], ["he", "he", "he", "he", "he", "he", "he", "he", "he", "he"], ["is", "is", "is", "is", "is", "is", "is", "is", "is", "is"], ["poor", "poor", "poor", "poor", "poor", "poor", "poor,", "poor,", "poor,", "poor"], ["&", "&", "&", "&", "&", "+", "and", "&", "and", "&"], ["is", "is", "is", "is", "is", "is", "is", "is", "is", "is"], ["seized,", "seized,", "seized,", "seized,", "seized,", "seized", "seized,", "seized,", "seized,", "seized,"], ["like", "like", "like", "like", "like", "like", "like", "like", "like", "like"], ["[unclear][/unclear]", "[unclear][/unclear]", "Midas", "Midas", "Midas", "[unclear][/unclear]", "[unclear][/unclear],", "Midas", "[unclear]Nudas[/unclear],", "Midas,"], ["with", "with", "with", "with", "with", "with", "with", "with", "with", "with"]], "extract_index":[33, 34, 34, 14, 6, 10, 0, 0, 0, 0], "gold_standard":[false, false, false, false, false, false, false, false, false, false], "low_consensus":false, "consensus_text":"source he is poor & is seized, like Midas with", "consensus_score":8.7},
{"flagged":false, "user_ids":[2408244, 1959934, 2114378, 1914031], "clusters_x":[543.9507215572211, 640.6058283924376], "clusters_y":[31.144423313569746, 37.588097102584186], "line_slope":-0.2810000000000059, "slope_label":0, "gutter_label":1, "number_views":4, "clusters_text":[["126", "126", "126", "126"], ["[deletion]209[/deletion]", "[deletion]209[/deletion]", "[deletion]207[/deletion]", "[deletion]209[/deletion]"]], "extract_index":[0, 1, 1, 11], "gold_standard":[false, false, false, false], "low_consensus":false, "consensus_text":"126 [deletion]209[/deletion]", "consensus_score":3.5}]
The code that loads and processes reductions is here. I’m fairly sure it was taken from ASM, and generates a pink or grey line for each reduction. https://github.com/zooniverse/front-end-monorepo/blob/master/packages/lib-classifier/src/store/SubjectStore/Subject/TranscriptionReductions/TranscriptionReductions.js
The subjects that had problems had multiple reductions with the same coordinates and consensus text, but different usernames.
🤦 I think I know what the issue is, for the OPTICS reducer the "distance" between classifications is:
This distance is found by summing the euclidean distance between the start points of each line, the Euclidean distance between the end points of each line, and the Levenshtein distance of the text for each line. The Levenshtein distance is done after stripping text tags and consolidating whitespace.
So if the typed text is significantly different a new cluster will be formed at the same position. This is better for the "consensus text" calculation but not better for displaying in the UI...
If it is happening a lot on a project they can try adjusting min_samples
from "auto"
to 3
and see if that helps.
For reference, subject 76124479
shows this issue of finding two clusters at the same position, each with different text.
I can see three ways to move forward:
min_samples
for the project and see if it helps (also might not help in this case, would need to do tests to find out)Not sure what others' opinions are, is this better handled in the reducer code with a flag or the way the front-end displays the results?
@snblickhan this discussion about transcribed lines was never concluded. Just wanted to see if this is on your radar - has there been any reports in the last year about complete and incomplete lines displaying on top of each other?
@goplayoutside3 I can't find the link ATM, but this was definitely resolved! Very likely a duplicate issue or one that wasn't closed after the actual problem was ID'd & fixed.
Thanks!
Here's an example of this bug from Maria Edgeworth Letters. Lines repeat two or three times, with identical text, in the Caesar reductions.
@eatyourgreens is your comment intended to re-open this Issue? Maria Edgeworth Letters is not an active project at the moment as it's out of data. Was the bug reported by that project team?
No, it's just another example of this bug in the Caesar reductions. Quite a good one, as that subject is mostly duplicated lines on the first page. This issue originally lacked links to examples, which made it hard to diagnose, so I've added that as an example.
For posterity, here's that subject in the classifier, showing this particular issue. Lines like "Edgeworth's Town" are rendered as three lines, on top of each other and each with only one choice in the drop-down menu, rather than one line with a drop-down menu containing all three choices. https://www.zooniverse.org/projects/mariaedgeworthletters/maria-edgeworth-letters/classify/workflow/18542/subject/87415864
The bug's easier to see in the DOM inspector, where you can see that lines transcribed-0
, transcribed-1
, and transcribed-2
are identical.
I think the bug was fixed in Caesar, but I’m not sure. I don't think duplicate lines show up in the Caesar reductions for newer projects. The subject viewer code doesn't detect and merge duplicates, just displays each consensus reduction from Caesar as a previously-transcribed line.
Package
lib-classifier
Describe the bug
After creating a green transcription line from a magenta, previously transcribed line, I can still interact with the magenta line and create new transcriptions.
To Reproduce
There are a couple of different ways to exploit this bug.
First: Create a green line from a magenta line, then drag the green line away from the magenta line (see #1836.) You can now click on the magenta line again, to create a second green line.
Second: Create a green line from a magenta line. Without moving the green line, tab or shift-tab back to the original magenta line. With the magenta line focussed, press Enter or Space to create a new green line.
Expected behavior
I'd expect magenta lines to either be replaced by green lines, or to remain but be disabled so that I can no longer interact with them (no onClick handler and tabindex reset to -1.)