mckennalab / SingleCellLineage

Updated scripts and pipelines for processing GESTALT data at single-cell resolution
19 stars 8 forks source link

run without docker #2

Closed qinqian closed 3 years ago

qinqian commented 3 years ago

Dear author,

Happy Thanksgiving! Thank you for presenting such a nice software, we are wondering if we could run the pipeline without docker. Could you please give us some hints about how to deploy the pipeline without the docker software?

Thanks, Best, Qian

aaronmck commented 3 years ago

Hi Qian,

Happy Thanksgiving! Sure, the Docker script is essentially just a setup script to install all the right parts-and-pieces on your machine. You'd have to adjust some paths; I'll try to include some documentation in the near future on getting this setup if you have issues. Best,

Aaron

qinqian commented 3 years ago

Hi Aaron,

Thank you for your tips! I'll explore about how to adjust these paths and let you know if issues arise.

Just another quick question, we have true negative control samples which are not edited by Cas9. However, we could still observe some Indel genotypes with the ratio of the genotype, it seems that only editing events are preserved for final outputs, is that possible to output the control unedited cell numbers as well?

Best, Qian

aaronmck commented 3 years ago

Hi Qian,

Huh, you should be seeing all captured sequences, edited or not. What output files are you looking at? The best place to check things is to look at the ".stats" file, which should have an entry for each sequence captured. Sometimes sequences will have UNKNOWN, where the sequencing wasn't long enough to capture over all target regions. Some output files will filter there events.

qinqian commented 3 years ago

Hi Aaron,

Before I was looking at the output files of ".topReadCounts", which is used to generating the html.

Thank you a lot for the instruction. I checked the ".stats" file, it contains much richer information. In our cases, it contains 48 to 99 percent of rows with UNKNOWN records. The ratio seems to be high possibly because we are using 358bp amplicons, which are sequenced by 150bp pair end sequencing. Is it suitable to say the UNKNOWN "PASS" records are from single cells that are wild type genotypes?

Best, Qian

aaronmck commented 3 years ago

Certainly it's best to sequence over all the targets to know for sure, but it's up to you if you'd like to call them unedited (wild-type). The stats file will always stay the same, but you can change UNKNOWNs to NONEs in the downstream files (like topReadCounts, allReadCounts, etc) by providing the '--unknownToNone' parameter to the main script.

qinqian commented 3 years ago

Sounds good, this answer will be a great reference in the future. Thanks again for sharing all these tips!