hagarelsayed / Ngs_2nd_Abstract

Second Abstract approved by Dr Tamer for RNA_Seq Differential Expression Analysis
0 stars 0 forks source link

troubleshooting the latest error for long chromosome names #5

Open hagarelsayed opened 4 years ago

hagarelsayed commented 4 years ago

Data Subsetting to an even smaller size

Preparing Environment

conda activate ngs1
conda install seqtk

Subset small number ; For test only; for file in ./*.fastq.gz ; do echo $file ; seqtk sample -s100 $file 500 > ${file/.fastq.gz/.fastq}; done

Alignment

Choose ERCC to work on

INDEX=chr22_with_ERCC92
RUNLOG=runlog.txt
READS_DIR=~/workdir/sample_data/renamed/
mkdir bam

for SAMPLE in UNT;
do
    for REPLICATE in 12 16 20;
    do
        R1=$READS_DIR/${SAMPLE}_Rep${REPLICATE}*pass_1.fastq
        R2=$READS_DIR/${SAMPLE}_Rep${REPLICATE}*pass_2.fastq
        BAM=bam/${SAMPLE}_${REPLICATE}.bam

        hisat2 $INDEX -1 $R1 -2 $R2 | samtools sort > $BAM
        samtools index $BAM
    done
done

Results of the first set;

500 reads; of these: 500 (100.00%) were paired; of these: 478 (95.60%) aligned concordantly 0 times 22 (4.40%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

478 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
478 pairs aligned 0 times concordantly or discordantly; of these:
  956 mates make up the pairs; of these:
    951 (99.48%) aligned 0 times
    5 (0.52%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

4.90% overall alignment rate

500 reads; of these: 500 (100.00%) were paired; of these: 481 (96.20%) aligned concordantly 0 times 19 (3.80%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

481 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
481 pairs aligned 0 times concordantly or discordantly; of these:
  962 mates make up the pairs; of these:
    957 (99.48%) aligned 0 times
    5 (0.52%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

4.30% overall alignment rate

500 reads; of these: 500 (100.00%) were paired; of these: 487 (97.40%) aligned concordantly 0 times 13 (2.60%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

487 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
487 pairs aligned 0 times concordantly or discordantly; of these:
  974 mates make up the pairs; of these:
    971 (99.69%) aligned 0 times
    3 (0.31%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

2.90% overall alignment rate

dd

Code for second set;

for SAMPLE in TTT;
do
    for REPLICATE in 13 17 21;
    do
        R1=$READS_DIR/${SAMPLE}_Rep${REPLICATE}*pass_1.fastq
        R2=$READS_DIR/${SAMPLE}_Rep${REPLICATE}*pass_2.fastq
        BAM=bam/${SAMPLE}_${REPLICATE}.bam

        hisat2 $INDEX -1 $R1 -2 $R2 | samtools sort > $BAM
        samtools index $BAM
    done
done

Results of second set :

500 reads; of these: 500 (100.00%) were paired; of these: 487 (97.40%) aligned concordantly 0 times 12 (2.40%) aligned concordantly exactly 1 time 1 (0.20%) aligned concordantly >1 times

487 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
487 pairs aligned 0 times concordantly or discordantly; of these:
  974 mates make up the pairs; of these:
    971 (99.69%) aligned 0 times
    1 (0.10%) aligned exactly 1 time
    2 (0.21%) aligned >1 times

2.90% overall alignment rate 500 reads; of these: 500 (100.00%) were paired; of these: 486 (97.20%) aligned concordantly 0 times 14 (2.80%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

486 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
486 pairs aligned 0 times concordantly or discordantly; of these:
  972 mates make up the pairs; of these:
    967 (99.49%) aligned 0 times
    5 (0.51%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

3.30% overall alignment rate 500 reads; of these: 500 (100.00%) were paired; of these: 485 (97.00%) aligned concordantly 0 times 14 (2.80%) aligned concordantly exactly 1 time 1 (0.20%) aligned concordantly >1 times

485 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
485 pairs aligned 0 times concordantly or discordantly; of these:
  970 mates make up the pairs; of these:
    968 (99.79%) aligned 0 times
    2 (0.21%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

3.20% overall alignment rate

Quantification

GTF=~/workdir/diff_exp/ref/ERCC92.gtf 
featureCounts -a $GTF -g gene_name -o counts.txt  bam/UNT*.bam  bam/TTT*.bam

The following error came up :

Failed to open the annotation file /home/ngs/workdir/diff_exp/ref/ERCC92.gtf, or its format is incorrect, or it contains no 'exon' features

crop

The Reference genome changed to GTF=~/workdir/sample_data/gencode.v29.annotation.gtf The feature count worked smoothly and this is the out put results 22

cat counts.txt | cut -f 1,7-12 > simple_counts.txt 
less simple_counts.txt

Results of Quantification:

The results could not be uploaded to git but found at this link

Results of Quantification

hagarelsayed commented 4 years ago

Differential Expression

cat counts.txt | cut -f 1,7-12 > simple_counts.txt

Simple counts produced simple_counts.txt

All the schedule is Zero may be because of the low alignment rate for the small genome

while trying to do the differential analysis by Deseq, cat simple_counts.txt | Rscript deseq1.r 3x3 > results_deseq1.tsv The following error came up : Error in parametricDispersionFit(means, disps) : Parametric dispersion fit failed. Try a local fit and/or a pooled estimation.

Deseq Error

The error may be due to values on the matrix is zero which was a reult of aligning to a small portion of the reference genome Now will try to get back to the alignment step to index another genome