sheynkman-lab / Long-Read-Proteogenomics

A workflow for enhanced protein isoform detection through integration of long-read RNA-seq and mass spectrometry-based proteomics.
MIT License
38 stars 16 forks source link

Iso-Seq command code #97

Closed gsheynkman closed 3 years ago

gsheynkman commented 3 years ago

For @adeslatt

Iso-Seq commands I ran in an interactive session on the UVA cluster:

Input is a jurkat.ccs.bam

# create an index
pbindex jurkat2.ccs.bam

module load isoseqenv
lima --isoseq --dump-clips --peek-guess -j 40 jurkat.ccs.bam NEB_primers.fasta jurkat.demult.bam
isoseq3 refine --require-polya jurkat.demult.NEB_5p--NEB_3p.subreadset.xml NEB_primers.fasta jurkat.flnc.bam

# clustering of reads, can only make faster by putting more cores on machine (cannot parallelize)
isoseq3 cluster jurkat.flnc.bam jurkat.polished.bam --verbose --use-qvs

# align reads to the genome, takes few minutes (40 core machine)
pbmm2 align hg38.fa jurkat.polished.transcriptset.xml jurkat.aligned.bam --preset ISOSEQ --sort -j 40 --log-level INFO

# collapse redundant reads
isoseq3 collapse jurkat.aligned.bam jurkat.collapsed.gff