zooniverse / panoptes-cli

A command-line interface for Panoptes
Apache License 2.0
18 stars 5 forks source link

Advice on Panoptes Extractor Parameters for Workflow #258

Open Jordan-Pierce opened 4 months ago

Jordan-Pierce commented 4 months ago

Hi,

Sorry for the sub-par formatting below, but I'm looking for advice on extraction. My command is below:

# This outputs 4 .yaml files 
panoptes_aggregation config click-a-coral-workflows.csv XXXXX-v XX.XXX

# This outputs 3 .csv files
panoptes_aggregation extract click-a-coral-classifications.csv Extractor_config_workflow_XXX_VXXX.XXX.yaml

However, I noticed that my ...classifcation.csv exported from Zooniverse actually contains all the information that I need, which is:

  1. T2 - User says Yes
  2. T0 - User Creates a single box
  3. T1 - User uses Survey tool to select a category
  4. Rinse and Repeat until they say No on T2

But, when using the panoptes_extraction tool, I'm unable to extract the Survey responses in T1 for all the iterations the user completes for a single subject; it seems to only record the last entry, though all the boxes are there. Note the example below, which contains one survey choices theseanivea, but has many boxes.

# Extracted Survey T1 CSV
classification_id,user_name,user_id,workflow_id,task,created_at,subject_id,extractor,data.choice,data.aggregation_version
554969661,Jordan-Pierce,XXXXXX,25828,T1,2024-04-16 13:41:43 UTC,98330197,survey_extractor,theseanivea,4.0.0

# Extracted Boxes T0
classification_id,user_name,user_id,workflow_id,task,created_at,subject_id,extractor,data.aggregation_version,data.frame0.T0_tool0_x,data.frame0.T0_tool0_y,data.frame0.T0_tool0_width,data.frame0.T0_tool0_height
554969661,Jordan-Pierce,XXXXXX,25828,T0,2024-04-16 13:41:43 UTC,98330197,shape_extractor_rectangle,4.0.0,"[657.6613159179688, -0.8562019467353821, 431.474853515625, 472.9900817871094, 477.2847595214844, 229.6249237060547, 1762.825439453125, 1035.593017578125, 925.3629760742188, 34.932796478271484]","[556.5188598632812, 454.87811279296875, 663.8858642578125, 655.2965087890625, 305.99591064453125, 414.79443359375, 504.9826965332031, 297.40655517578125, 159.976806640625, 326.0377197265625]","[141.724365234375, 144.58754044771194, 93.0513916015625, 166.06094360351562, 161.76626586914062, 97.34605407714844, 120.2509765625, 120.2510986328125, 67.2833251953125, 133.13505172729492]","[160.334716796875, 253.3861083984375, 111.66162109375, 151.74530029296875, 130.27194213867188, 94.48294067382812, 108.79855346679688, 153.1768798828125, 94.48294067382812, 135.9981689453125]"

I'm also unsure what's up with all the box coordinates in the list? But I'm not too worried, because the actual data in the format I need is in the original ...classifications.csv exported from Zooniverse, which I can parse myself if I need to (see below):


"[{""task"":""T2"",""task_label"":""See any coral needing labels?"",""value"":""Yes""}, 
{""task"":""T0"",""task_label"":""Create a bounding box around the individual coral"",""value"": 
[{""x"":555.0583801269531,""y"":138.4544219970703,""tool"":0,""frame"":0,""width"":851.5688171386719,""height"":559.7725067138672,""details"":[],""tool_label"":""Bounding Box""}]},
{""task"":""T1"",""value"":[{""choice"":""MURICEAPENDULA"",""answers"":{},""filters"":{}}]}, 

{""task"":""T2"",""task_label"":""See any coral needing labels?"",""value"":""Yes""}, 
{""task"":""T0"",""task_label"":""Create a bounding box around the individual coral"",""value"": 
[{""x"":1573.3675537109375,""y"":192.04953002929688,""tool"":0,""frame"":0,""width"":279.8865966796875,""height"":232.24615478515625,""details"":[],""tool_label"":""Bounding Box""}]},
{""task"":""T1"",""value"":[{""choice"":""MURICEAPENDULA"",""answers"":{},""filters"":{}}]}, 

{""task"":""T2"",""task_label"":""See any coral needing labels?"",""value"":""No""}]"

Any suggestions on manual edits to my .yaml file to parse this as needed? Thanks