jbloomlab / seqneut-pipeline

Pipeline for analyzing sequencing-based neutralization assays
MIT License
0 stars 0 forks source link

plate-level manual drops with 'wells' causes assertion error #45

Closed ckikawa closed 6 months ago

ckikawa commented 6 months ago

I have specified a manual_drop in my config for a specific plate, which looks like this:

plate17:
    group: SCH
    date: 2024-04-11
    viral_library: H3N2_library
    neut_standard_set: loes2023
    samples_csv: data/plates/2024-04-11_neut_SCH_plate1.csv
    manual_drops:
      wells: 
        - [A9]
    qc_thresholds:
      <<: *default_process_plate_qc_thresholds
    curvefit_params:
      <<: *default_process_plate_curvefit_params
    curvefit_qc:
      <<: *default_process_plate_curvefit_qc

This causes an assertion error in the following block of code in the process_plate17.ipynb notebook. I'm not sure, but I believe this is because the manual_drops dictionary requires the key wells (plural!), but the actual column in the counts file is named well (singular!).

for filter_type, filter_drops in manual_drops.items():
    print(f"\nDropping {len(filter_drops)} {filter_type} specified in manual_drops")
    assert filter_type in qc_drops
    qc_drops[filter_type].update(
        {w: "manual_drop" for w in filter_drops if not isinstance(w, list)}
    )
    if filter_type == "barcode_wells":
        counts = counts[
            ~counts.assign(
                barcode_well=lambda x: x.apply(
                    lambda r: (r["barcode"], r["well"]), axis=1
                )
            )["barcode_well"].isin(qc_drops[filter_type])
        ]
    elif filter_type == "barcode_serum_replicates":
        counts = counts[
            ~counts.assign(
                barcode_serum_replicate=lambda x: x.apply(
                    lambda r: (r["barcode"], r["serum_replicate"]), axis=1
                )
            )["barcode_serum_replicate"].isin(qc_drops[filter_type])
        ]
    else:
        assert filter_type in set(counts.columns)
        counts = counts[~counts[filter_type].isin(qc_drops[filter_type])]
------------------

----- stdout -----

Dropping 1 wells specified in manual_drops
------------------

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[7], line 24
     16     counts = counts[
     17         ~counts.assign(
     18             barcode_serum_replicate=lambda x: x.apply(
   (...)
     21         )["barcode_serum_replicate"].isin(qc_drops[filter_type])
     22     ]
     23 else:
---> 24     assert filter_type in set(counts.columns)
     25     counts = counts[~counts[filter_type].isin(qc_drops[filter_type])]

In this case, this issue could be resolved by adding another elif statement to the same cell, something like:

    elif filter_type == "wells":
        counts = counts[
            ["well"].isin(qc_drops[filter_type])
        ]

Happy to try to tackle this @jbloom, unless you'd prefer to look at it!

jbloom commented 6 months ago

@ckikawa, I think your diagnosis of the problem appears correct, but I think wouldn't a better solution just to be to re-define the relevant key of under manual_drops in the config to be well rather than wells?

That solution would not require changing code (just documentation), and would make it cleaner.

Also, this would not really be a backward-incompatible change as given the problem you identified, no one must be using wells right now or they would be getting the same error.

So I would suggest testing if you can fix this by just updating wells to well in the YAML configuration, and if that works just do a pull request fixing the documentation here rather than changing the code.

ckikawa commented 6 months ago

@jbloom, unfortunately just changing at the level of the config will not work because of this block of code, also in process_plate.ipynb. When I run with manual_drops key well instead of wells, an error is given:

qc_drops = {                                                                                                                                                                                              
    "wells": {},                                                                                                                                                                                          
    "barcodes": {},                                                                                                                                                                                       
    "barcode_wells": {},                                                                                                                                                                                  
    "barcode_serum_replicates": {},                                                                                                                                                                       
    "serum_replicates": {},                                                                                                                                                                               
}                                                                                                                                                                                                         

assert set(manual_drops).issubset(                                                                                                                                                                        
    qc_drops                                                                                                                                                                                              
), f"{manual_drops.keys()=}, {qc_drops.keys()}"  
------------------                                                                                                                                                                                        

---------------------------------------------------------------------------                                                                                                                               
AssertionError                            Traceback (most recent call last)                                                                                                                               
Cell In[4], line 9                                                                                                                                                                                        
      1 qc_drops = {                                                                                                                                                                                      
      2     "wells": {},                                                                                                                                                                                  
      3     "barcodes": {},                                                                                                                                                                               
   (...)                                                                                                                                                                                                  
      6     "serum_replicates": {},                                                                                                                                                                       
      7 }                                                                                                                                                                                                 
----> 9 assert set(manual_drops).issubset(                                                                                                                                                                
     10     qc_drops                                                                                                                                                                                      
     11 ), f"{manual_drops.keys()=}, {qc_drops.keys()}"

AssertionError: manual_drops.keys()=dict_keys(['well']), dict_keys(['wells', 'barcodes', 'barcode_wells', 'barcode_serum_replicates', 'serum_replicates'])

The dictionary qc_drops is also used throughout the notebook to add wells, barcodes etc. failing default QC parameters.

So, I think the solution could be either:

  1. Adding an elif statement similar to the one I wrote above
  2. Changing the dictionary qc_drops key from wells to well, which would also require small changes in other cells in the process_plate.ipynb where wells is keyed
jbloom commented 6 months ago

OK, in that case I would say go ahead and make the change you suggest as a pull request. Does a similar change need to be made for the barcodes drop?

ckikawa commented 6 months ago

Yes, I think so since qc_drops specifies barcodes but the counts dataframe column is named barcode. I'll do a similar fix for that.