mckennalab / SingleCellLineage

Updated scripts and pipelines for processing GESTALT data at single-cell resolution
19 stars 8 forks source link

Pipeline clarification #5

Open castaway1990 opened 3 years ago

castaway1990 commented 3 years ago

Hi! Thank you, the analytical workflow you provided is amazing and the most streamlined I could find for GESTATALT-like methods analysis. I was just wondering at which point in the pipeline script you take advantage of merged and aligned reads for tree reconstruction, I saw some scripts for this purpose but no wrapping from the main CRISPR_analysis_PE_V2.scala. Furthermore I couldn't find files that summarise the cell (barcode) lineages at the end of my test_run.sh.

test_run.sh

  java -Xmx4g \

 -jar /app/queue.jar \

 -S $WD/SingleCellLineage/pipelines/CRISPR_analysis_PE_V2.scala \

 -i $WD/data/tol2_simulated_data_tear_sheet.txt \

 --aggLocation $WD/data/pipeline_output/ \

 --expName my_test_data \

 --eda /app/EDNAFULL.Ns_are_zero \

 -run \

 --dontTrim \

 --primersToUse FORWARD \

 --umiIndex 10X \

 -s $WD/SingleCellLineage/scripts/ \

 -b /app/bin/ \

 --web $WD/webout/ \

 --scala "/usr/bin/scala -nocompdaemon" \

 --minimumUMIReads 4 \

 --minimumSurvivingUMIReads 3 \

 --umiLength 28 \

 --umiMemLimit 4

The pipeline finishes with this only warning WARN 14:12:16,086 RScriptExecutor - Skipping: Rscript (resource)org/broadinstitute/gatk/queue/util/queueJobReport.R $WD/CRISPR_analysis_PE_V2.jobreport.txt $WD/CRISPR_analysis_PE_V2.jobreport.pdf

I am using your docker environment for dependencies and bins but pointing at pipeline and helper script of this repo's latest version as you can see from test_run.sh. In the end .stats and .umiCounts are correctly generated. I am for sure missing something.

Thanks for the support! :) Davide

aaronmck commented 3 years ago

Hi Davide,

Thanks for the kind words. Right now lineage barcode processing and tree generation are split, and I've moved the tree generation stuff out to it's own repository: https://github.com/mckennalab/TreeUtils

All of the lineage barcode stuff is (relatively) new, so I didn't want people to see this as an end-to-end solution. Rather you should look at the stats from your run, get a sense for what worked or didn't, and then move onto tree generation. Saying that, I should add a lot more documentation to that git repository to help with the final step. I'll add an issue over and try to get an example or two in this week. Thanks!

-Aaron

castaway1990 commented 3 years ago

Hi Aaron,

Thank you for the prompt answer, I understand your point. Being able to pre-process and align to custom array is actually a noticeable speedup per se and maybe proxy for testing different tree reconstructions. Looking forward to getting your updates.

Thank you, Davide