salvadorlab / MinION_Transcriptome

0 stars 0 forks source link

Transcriptomic data analyses #1

Open rx32940 opened 3 years ago

rx32940 commented 3 years ago

Copenhageni_Basecalled_Aug_16_2019_Direct-cDNA_NoPolyATail Copenhageni_Basecalled_Aug_16_2019_Direct-cDNA_NoPolyATail_Qiagen Copenhageni_Basecalled_Aug_16_2019_Direct-cDNA_PolyATail Icterohaemorrhagiae_Basecalled_Aug_16_2019_Direct-cDNA_NoPolyATail Icterohaemorrhagiae_Basecalled_Aug_16_2019_Direct-cDNA_PolyATail Mankarso_Basecalled_Aug_16_2019_Direct-cDNA_NoPolyATail Mankarso_Basecalled_Aug_16_2019_Direct-cDNA_PolyATail Patoc_Basecalled_Aug_16_2019_Direct-cDNA-NoPolyATail Patoc_Basecalled_Aug_16_2019_Direct-cDNA_PolyATail Q29_Copenhageni_Basecalled-June_11_2020_Repeat_Direct-RNA Q29_Copenhageni_Basecalled_May_22_2020_Direct-RNA Q36_Copenhageni_Basecalled_June_9_2020-Repeat_Direct-RNA Q36_Copenhageni_Basecalled_May_31_2020_Direct-RNA

rx32940 commented 3 years ago

QC data

MinIONQC

https://github.com/roblanf/minion_qc

FastQC

1) Fastq files for each sample are merged with merge_fastq.sh 2) run fastQC with each sample's merged file using run_fastQC.sh 3) multiQC all samples results together

rx32940 commented 3 years ago

Map Data

Reference used: Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130 (bacteria) GCF_000007685.1_ASM768v1_genomic.fna

minimap2

mapped using run_minimap2.sh 1) cDNA command:

minimap2 -ax splice GCF_000007685.1_ASM768v1_genomic.fna $file > minimap2/sam/$sample.sam

2) DirectRNA command

minimap2 -ax splice -uf -k14 ../GCF_000007685.1_ASM768v1_genomic.fna $file > minimap2/sam/$sample.sam

3) mapping stats evaluated using samtools stat and qualimap bamqc(sorted alignment)

rx32940 commented 3 years ago

Count Transcript

Transcriptome: GCF_000007685.1_ASM768v1_cds_from_genomic.fna Genome:GCF_000007685.1_ASM768v1_genomic.fna

Salmon

https://github.com/COMBINE-lab/salmon 1) create salmon env

conda create --name salmon
conda activate

2) download salmon

conda install --channel bioconda salmon

3) build decoy index transcript fasta

grep "^>" < GCF_000007685.1_ASM768v1_genomic.fna | cut -d " " -f 1 > decoys.txt
sed -i.bak -e 's/>//g' decoys.txt
cat GCF_000007685.1_ASM768v1_cds_from_genomic.fna GCF_000007685.1_ASM768v1_genomic.fna > gentrome.fa

4) run script: run_salmon.sh

flair

  1. build flair env: conda env create -f environmental.yaml (build_env.sh)
  2. run script: run_flair.sh
  3. Due to rpy2 version change, some edits were made to script bin/runDE.py and bin/runDU.py
    1. add import from rpy2.robjects.conversion import localconverter to both scripts
    2. change lines used df = pandas2ri.py2ri(quantDF) function to (another line used same function will also need to change based on following format). This change also need to find and change in script bin/runDU.py
          with localconverter(robjects .default_converter + pandas2ri.converter):
                    df = robjects.conversion.py2ri(quantDF)

stringtie2

  1. edit the GTF file to solve the StringTIe Error: no valid ID found for GFF record error: awk '$3 != "gene" ' GCF_000007685.1_ASM768v1_genomic.gtf > GCF_000007685.1_ASM768v1_genomic_edited.gtf
rx32940 commented 3 years ago

IGV view comparing three Copenhageni isolates

REF:

GCF_000007685.1_ASM768v1_cds_from_genomic.fna(.fai)

Track:

1) Copenhageni_Basecalled_Aug_16_2019Direct-cDNANoPolyATail.sorted.bam(.bai) 2) Copenhageni_Basecalled_Aug_16_2019Direct-cDNAQiagen_NoPolyATail.sorted.bam(.bai) 3) Copenhageni_Basecalled_Aug_16_2019Direct-cDNAPolyATail.sorted.bam(.bai)

Screen Shot 2021-04-12 at 2 15 33 PM