ome / omero-scripts

Core OMERO Scripts
https://pypi.org/project/omero-scripts/
12 stars 32 forks source link

Hcs keyvals from csv #195

Closed will-moore closed 2 years ago

will-moore commented 2 years ago

This PR contains all commits from https://github.com/mpievolbio-scicomp/omero-scripts/pull/1.

To test: a CSV of this format should behave according to the comment after each row:

plate           well    image               Cell line
Index.idx.xml                               S2R+        # add KV to plate
Index.idx.xml   B2                          S2R+        # add KV to well only
Index.idx.xml   B2      Well 1 Field 1      S2R+        # add KV to image only
Index.idx.xml           Well 1 Field 2      S2R+        # add KV to image only

A couple of additional points addressed in this PR:

Screenshot 2022-01-23 at 22 44 19

cc @abhamacher Does this look OK? If you could give this a final test and 👍 if looking good, that would be great, thanks.

To test (OME)

Using Plate at https://merge-ci.openmicroscopy.org/web/webclient/?show=plate-16105 (user-3). This has a test .csv file attached, using the format above. Sample of it pasted below. The target column says what should be annotated by each row. Download, inspect and edit the file, then use to run the annotations_scripts/KeyVal_from_csv.py script. Rows which have an image name will annotate that Image (Well column is ignored) Rows without an image will annotate the Well. NB: Can use the Remove_KeyVal.py script to remove all Well and Image annotations on the Plate if needed.

plate,well,image,target,Time,cell_count,Mitotic fraction,
to-plate_1,A1,A1 Field 1,Image A1 Field 1,10,1,0.1,
to-plate_1,A1,A1 Field 2,Image A1 Field 2,11,2,0.11,
to-plate_1,A2,A2 Field 1,Image A2 Field 1,20,3,0.2,
to-plate_1,A2,,Well A2,21,4,0.21,
to-plate_1,A3,A3 Field 1,Image A3 Field 1,30,5,0.3,
to-plate_1,A4,,Well A4,40,6,0.4,
to-plate_1,A5,,Well A5,50,7,0.5,
to-plate_1,A6,A6 Field 1,A6 Field 1,60,8,0.6,
to-plate_1,B10,B1 Field 1,Image B1 Field 1 (Well ignored),110,9,0.1,
to-plate_1,B20,B2 Field 1,Image B2 Field 1 (Well ignored),120,10,0.2,
to-plate_1,,,Plate (No Well or Image),130,11,0.3,
invalid,B4,B4 Field 1,NONE (plate name invalid),140,12,0.4,
to-plate_1,B5,B5 Field 1,Image B5 Field 1,160,13,0.5,
to-plate_1,B6,B6 Field 1,Image B6 Field 1,160,14,0.6,
abhamacher commented 2 years ago

Hi Will,

sorry, that my response time is so slow at the moment. This is still an important topic for me, it's just the lab work, that is keeping me busy at the moment.

I did a test right now and also really, really like the CSV delimiter sniffing. Thanks @JensWendt for the idea. Worked well for my data, which saves some effort and helps to avoid errors due to the commas in the default image names.

I only encountered one issue, when creating an Excel file with the annotations and then saving it to csv, which is the most likely scenario for our biologists. When I tried to import this csv file, the script struggled with the BOM at the beginning. The info message is green, but the log stucks at the column headers:

kv-script

script params File_Annotation 1603 IDs [1] Data_Type Plate set ann id 1603 Original File 22636 Copy of plate_test.csv Using delimiter: ; header ['\ufeffPlate', 'Well', 'Image', 'Test1', 'Compound', 'Description']

Maybe this is something you could capture and remove the BOM before processing?

Thank you very much for your effort! Looking forward to enroll the script on our production system once it's finally released.

Anna

JensWendt commented 2 years ago

Hi Anna,

I also encountered this problem. My VERY basic solution to this was to add another line after header = data[0] --> header[0]=str(header[0]).replace("\ufeff","") From my understanding this is the BOM for UTF-8 encoding. There are other possible BOMs but I only encountered this one for now. I am sure there are other more elegant versions out there. But for now this little workaround is helping.

imagesc-bot commented 2 years ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multiple-key-value-pair-map-annoation-questions/62976/5

will-moore commented 2 years ago

Hi @JensWendt @abhamacher, I've tried handling the BOM in the same way that we do for Populate_Metadata script. I've tested this with a sample BOM script I have and it works for that - Hopefully it'll work for yours too?

JensWendt commented 2 years ago

Hi Will,

I tried it out with two different .csv files and it worked. Seems good to me. Thanks!

will-moore commented 2 years ago

See https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/37 for issues with delimiter sniffing...

imagesc-bot commented 2 years ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/58

imagesc-bot commented 2 years ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/uploading-key-value-pairs-from-csv-files-into-omero-web-for-plates/60202/61

JensWendt commented 2 years ago

Hello @will-moore ,

I checked with two plates in one screen. Both with multiple wells and multiple images per well. There was one combined .csv. I have three remarks:

  1. I had to declare

    header=[]
    nimg_updated = 0
    missing_names = 0
    images_by_name = {}

    in line 137+ at the beginning of def keyval_from_csv(), because otherwise the script would complain that I referenced local variables without declaring them first when the return message was created in the end of this method.

  2. The mouseover tooltip for the IDs script parameter is described as "Plate or Screen ID.". That is misleading, as I can only select Dataset or Plate as data type. I tried to put the Screen ID in that my two plates were under and of course it did not work. We either change it to also allowing Screen as data type and then getting the objects in the code via listChildren() - assuming I want to annotate all plates in it - or we have to change the tooltip to accurately represent the wanted input. Maybe something like "ID(s) of your dataset(s) or plate(s)."

  3. The actual out message in the end told me "Added 5 kv pairs to 97/64 files. 396 image names not found." although the script did its job perfectly. There are multiple things to unpack here. First the number of kv pairs is determined by len(header)-1 this might be better represented by:

    num_kv_pairs=len(header)-1
    if well_index > -1:
     num_kv_pairs-=1
    if plate_index > -1:
     num_kv_pairs-=1

    Second the value after "/". Determined by len(images_by_name). Might be more accurate with an image_counter variable comprised by len(images_by_name) + len(wells_by_name) for each obj iteration plus len(ids) in the end.

Third, we iterate over the objects (plates) first and then over all the rows. With multiple plates worth of image and well names of course we only match with a fraction of the correct names. Every other name is labeled "not found". This leads to an irritating amount of false negatives with "396 image names not found". To be fair I have no good solution for this from the top of my head. Maybe just skip this error message altogether, but this also is a very sub-par solution.

Well, that concludes my inputs. If I have a super-smart idea how to solve the "images not found" issue I will let you know. If point 1, 2, 3.1 and 3.2 are addressed I feel we are green.

will-moore commented 2 years ago

@JensWendt Thanks for that - I've pushed a fix (but not had time to test it yet)... I tried to improve the output message when you select multiple Plates/Datasets. I removed the report on number of columns (since this could be different for each Object) and simply summed up the number of images updated/total.

pwalczysko commented 2 years ago

Worked fine for me. Tested also a file with BOM. Ready to merge fmpov.