simslab / dna10x

Pipeline for processing scDNA-seq data obtained using the 10x Genomics scATAC-seq kit.
0 stars 1 forks source link

Sufficient information to run the pipeline #1

Open Laolga opened 4 months ago

Laolga commented 4 months ago

Dear authors, Please provide information needed to execute your pipeline: 1) what is the format of the samplesheet 2) how can one know barcodes before running any analysis? 3) What is BARCODE_START_CYCLE? 4) What is rc? 5) What is ad?

pas2182 commented 4 months ago
  1. When you sequence with an Illumina sequencer, there is a standard file called a "sample sheet" that is required for demultiplexing by the Illumina Experiment Manager. It is also required for demultiplexing with cell ranger, which provides details on formatting this file here: https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-mkfastq.
  2. 10x Genomics uses a pre-defined set of barcode sequences for each kit. For example, for the 10x Multiome kit, you can find the details of the barcode sequences here: https://kb.10xgenomics.com/hc/en-us/articles/4412343032205-Where-can-I-find-the-barcode-whitelist-s-for-Single-Cell-Multiome-ATAC-GEX-product.
  3. BARCODE_START_CYCLE is the cycle of sequencing in the cell-identifying barcoding-containing read where the first base of the cell-identifying barcode is read.
  4. rc = revserse complement. Use this option if your cell-identifying barcode list contains the reverse complement of the barcodes read by the sequencer.
  5. ad = adapter. This is the adapter sequence to be trimmed from the end of short fragments.
mariaZig commented 1 month ago

Hello,

Thanks for the custom pipelines and for this nice protocol!

On a similar note, I'm having trouble understanding the exact input I should use to run the DNA-based pipeline.

Would it be possible to give me a specific example?

Please provide if possible an example samplesheet.csv file and also a specific value for the "--directory" parameter assuming that I already have my FASTQ files ready, so I don't need to run cellranger to produce them from the BCL files.

Thanks in advance, Maria

tro2104 commented 1 month ago

Hello Maria,

Here is an example of a few runs and how to set up the software. It assumes the fastq files are in the directory that bcl2fastq would create. So you need to make that path and put your fastq's in it if you don't have that path already.

Create conda environment conda create -n cutadapt -c bioconda -c conda-forge cutadapt python=3.9 bwa pysam samtools numpy3 Download dna10x pipeline from github Create sample sheet in directory with the pipeline vim ss.csv i Lane,Sample,Index *,PTO035,SI-NA-F1 wq

Download reference or use 10x cellranger references

Make Directories Within the dna10x directory with all the associated .py files create the following path mkdir PTO035/outs/fastq_path/PTO035/PTO035/

Run pipeline <With exisitng fastqs, assumed to be located in dna10x/PTO035/outs/fastq_path/PTO035/PTO035/> nohup python dna10x.py --samplesheet ss.csv -d PTO035 -b /opt/cellranger-atac-2.0.0/lib/python/atac/barcodes/737K-arc-v1.txt -t 16 -r /opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa -i 1000 -m 0.9 -c -sf -p 1 -rc -ad CTGTCTCTTATACACATCT &

<With BCL to fastq, may need to install bcl2fastq> nohup python dna10x.py --bcl ~/230407_NB551203_0654_AH5LM2BGXT --samplesheet ss.csv -d PTO035 -b /opt/cellranger-atac-2.0.0/lib/python/atac/barcodes/737K-arc-v1.txt -t 16 -r /opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa -i 1000 -m 0.9 -p 1 -rc -ad CTGTCTCTTATACACATCT -c &

Tim