Bioconductor package - Githubissues

Data Set :

As the Data was quite large , (I didn't know about the subsetting at then). So, There was an option to get a subset of the data of a much slower size to check the workflow of other steps. A small subset included in a Bioconductor package called airway , Installed via R by this command

> if (!requireNamespace("BiocManager", quietly = TRUE))
+   install.packages("BiocManager")
> BiocManager::install("airway")

### then In command line :
 tar xvzf airway_1.6.0.tar.gz

Rstudio was installed by ;

wget https://download1.rstudio.org/rstudio-xenial-1.1.419-amd64.deb
sudo gdebi rstudio-xenial-1.1.379-amd64.deb
sudo apt-get install libopenblas-base r-base
sudo apt-get install gdebi
sudo gdebi rstudio-xenial-1.1.379-amd64.deb

The files were in bam files already , so convert it back to sam via

  for file in ./*.bam; do     echo $file ;     samtools view -h $file > ${file/.bam/.sam}; done
  conda activate ngs1
  samtools
  for file in ./*.bam; do     echo $file ;     samtools view -h $file > ${file/.bam/.sam}; done
  less SRR1039508_subset.sam

Then the output Sam files were converted back to Fastq, all this to check why the error feature count could work on this data and not on ours


for file in ./*.bam; do     echo $file ;     samtools bam2fq $file > ${file/.sam/.FASTQ}; done

cat SRR508.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR508_r1.fastq 
cat SRR508.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR508_r2.fastq
less SRR508_r2.fastq 
cat SRR509.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR509_r1.fastq 
cat SRR509.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR509_r2.fastq
cat SRR512.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR512_r1.fastq 
cat SRR512.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR512_r2.fastq
cat SRR513.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR513_r1.fastq 
cat SRR513.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR513_r2.fastq
cat SRR516.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR516_r1.fastq 
cat SRR516.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR516_r2.fastq
cat SRR517.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR517_r2.fastq
cat SRR517.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR517_r1.fastq 
cat SRR517.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR517_r2.fastq
cat SRR520.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR520_r1.fastq 
cat SRR520.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR520_r2.fastq
cat SRR521.FASTQ | grep '^@.*/1$' -A 3 --no-group-separator > SRR521_r1.fastq 
cat SRR521.FASTQ | grep '^@.*/2$' -A 3 --no-group-separator > SRR521_r2.fastq

Quantification:

GTF=~/Diff_proj/Homo_sapiens.GRCh37.75_subset.gtf 
featureCounts -a $GTF -g gene_name -o counts.txt  Bam_org/UNT*.bam  Bam_org/TTT*.bam

Simplify the file to keep only the count columns.

cat counts.txt | cut -f 1,7-12 > simple_counts.txt less simple_counts.txt Analyze the counts with DESeq1.


cat simple_counts.txt | Rscript deseq1.r 3x3 > results_deseq1.tsv
head results_deseq1.tsv
cat results_deseq1.tsv | awk ' $8 < 0.05 { print $0 }' > filtered_results_deseq1.tsv
cat filtered_results_deseq1.tsv | Rscript draw-heatmap.r > hisat_output.pdf

DeSEQ1 Output header description

View only rows with pval < 0.05 `cat results_deseq1.tsv | awk ' $8 < 0.05 { print $0 }' > filtered_results_deseq1.tsv cat filtered_results_deseq1.tsv | Rscript draw-heatmap.r > hisat_output.pdf

The resulted files are as follows ; counts.txt simple_counts.txt counts.txt.summary filtered_results_deseq1.tsv norm-matrix-deseq1.txt results_deseq1.tsv

hagarelsayed / Ngs_2nd_Abstract

Bioconductor package #3

Data Set :

Quantification: