morispi / LEVIATHAN

Linked-reads based structural variant caller with barcode indexing
GNU Affero General Public License v3.0
3 stars 2 forks source link

Workflow to get bam file as input for LEVIATHAN #10

Open jtweir opened 1 year ago

jtweir commented 1 year ago

I would like to request a clear example of the commands to run to generate the required input for LEVIATHAN. By reading the preprint and user comments I am able to piece together that I need to use longranger basic to process 10x linked read data and then run bwa-mem with the -cp flag to generate the appropriate BAM file required as input for LEVIATHAN. Still, example commands that are known to work would be greatly appreciated. For example I suspect longranger basic must be run separately on forward and reverse read outputs, but I am not sure.

jtweir commented 1 year ago

Here is a pipeline I eventually got to work after much trial and error. I am providing it here as help to others who may be struggling to get input data that works. The main issue for me was that LRez included with LEVIATHAN generated a stoi error on my input data (but not of the included example file that comes with LEVIATHAN). The eventual trick was to download LRez from its dedicated github page and use that instead. Here is my pipeline:

first make a bwa index of ref genome

/home/0_PROGRAMS/bwa/bwa index REF_GENOME.fasta

run longranger basic to produce the file named barcoded.fastq.gz

/home/0_PROGRAMS/longranger-2.2.2/longranger basic --id SAMPLE1 --fastqs PATH_to_FASTAQ_DIRECTION --sample SPECIES1 --jobmode=local --localcores=20 --localmem=220

run bwa

/home/0_PROGRAMS/bwa/bwa mem -Cp -R "@RG\tID:id\tSM:sample\tLB:lib" -t 4 REF_GENOME.fasta barcoded.fastq.gz \ | samblaster --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 \ | samtools view -S -b -@ 10 -h \

SPECIES1_bwa.bam

Sort and Index BAM alignment

samtools sort --threads 5 SPECIES1_bwa.bam -o SPECIES1SORTED.bam samtools index -b SPECIES1SORTED.bam

now install LRez from its own github page or if you wish to use the native LRex bundled with LEVIATHAN see these modified instructions for installation:

/home/0_PROGRAMS/LRez/bin/LRez index bam -p -b SPECIES1__SORTED.bam -o barcodeIndex.bci

Now run LEVIATHAN

/home/0_PROGRAMS/LEVIATHAN/bin/LEVIATHAN \ -b SPECIES1__SORTED.bam \ -i barcodeIndex.bci \ -g REF_GENOME.fasta \ -o SV.vcf

jtweir commented 1 year ago

The above (LRex and Leviathan) took about 2 days using 20 threads when using a non-model ref genome with a scaffold N50 of about 20 (1.1 Gb genome with 29,000 scaffolds in the fasta file). When using only scaffolds > 10,000 bp (n=700) in the ref genome, these steps took about 12 hours.