As the Data was quite large , (I didn't know about the subsetting at then). So, There was an option to get a subset of the data of a much slower size to check the workflow of other steps. A small subset included in a Bioconductor package called airway , Installed via R by this command
> if (!requireNamespace("BiocManager", quietly = TRUE))
+ install.packages("BiocManager")
> BiocManager::install("airway")
### then In command line :
tar xvzf airway_1.6.0.tar.gz
Data Set :
As the Data was quite large , (I didn't know about the subsetting at then). So, There was an option to get a subset of the data of a much slower size to check the workflow of other steps. A small subset included in a Bioconductor package called airway , Installed via R by this command
Rstudio was installed by ;
The files were in bam files already , so convert it back to sam via
Then the output Sam files were converted back to Fastq, all this to check why the error feature count could work on this data and not on ours
Quantification:
Simplify the file to keep only the count columns.
cat counts.txt | cut -f 1,7-12 > simple_counts.txt less simple_counts.txt
Analyze the counts with DESeq1.DeSEQ1 Output header description
View only rows with pval < 0.05 `cat results_deseq1.tsv | awk ' $8 < 0.05 { print $0 }' > filtered_results_deseq1.tsv cat filtered_results_deseq1.tsv | Rscript draw-heatmap.r > hisat_output.pdf
The resulted files are as follows ; counts.txt simple_counts.txt counts.txt.summary filtered_results_deseq1.tsv norm-matrix-deseq1.txt results_deseq1.tsv