Open rhagenson opened 1 year ago
This is not exactly necessary right now as usually all samples are run at once so any sgRNAs found in a single sample will get a count of 1 for samples where it was not found. However, this implicit application will not hold if #4 is implemented in certain ways. It best to make the switch described here as a bit of defensive programming (#2).
Right now the paired counts are inner joined (the default behavior of
merge
).https://github.com/sheltzer-lab/crispr-screening/blob/1a6f8c1cbe94433e4abfc02d47247ba92c21ade4/bin/extract-reads.py#L75-L85
However, the downstream analysis (i.e., MAGeCK mle) should be able to handle merging counts properly if we do a full (outer) join then impute with ones/1s.