kircherlab / MPRAsnakeflow

new implementation of MPRAsnakeflow fork of MPRAflow
MIT License
3 stars 5 forks source link

Errors in merge_replicates_barcode_counts.py when BC threshold too high #139

Open pi-zz-a opened 1 week ago

pi-zz-a commented 1 week ago

I set the BC threshold to minimum 10, which appeared to be too high for my dataset. An empty dataframe was returned after filtering, which in turn results in a missing key error for the keys "dna_count_1" "rna_count_1" etc. To make this more intuitive, either an error returning an explanation, or creating an empty filtered file, or both, would be good.

visze commented 4 days ago

It seems to be that # order columns to have dna then rna count of each replicate expects a certain number of columns but

df = df.pivot_table(

does not create them.

So the script is buggy. Possible solutions are:

  1. check input files and throw exception when one of them is empty
  2. pivot tables and add headers if they do not exist. then order them later.