sheltzer-lab / crispr-screening

A Nextflow script which conducts the computational analysis associated with CRISPR screening as done within the Sheltzer Lab.
MIT License
0 stars 0 forks source link

inner -> outer join with imputation #5

Open rhagenson opened 1 year ago

rhagenson commented 1 year ago

Right now the paired counts are inner joined (the default behavior of merge).

https://github.com/sheltzer-lab/crispr-screening/blob/1a6f8c1cbe94433e4abfc02d47247ba92c21ade4/bin/extract-reads.py#L75-L85

However, the downstream analysis (i.e., MAGeCK mle) should be able to handle merging counts properly if we do a full (outer) join then impute with ones/1s.

rhagenson commented 1 year ago

This is not exactly necessary right now as usually all samples are run at once so any sgRNAs found in a single sample will get a count of 1 for samples where it was not found. However, this implicit application will not hold if #4 is implemented in certain ways. It best to make the switch described here as a bit of defensive programming (#2).