Closed Saraelattar closed 4 years ago
using the bioinformatic tools investigating the outbreak by assembling the genome of the deadly E. coli X strain. Specifically, we will provide you with Illumina reads from the TY2482 sample, which were generated at Beijing Genome Institute and deposited into the Short Read Archive (SRA) for public access.
[Data Retrieval](url) NCBI’s fastq-dump from sra-toolkit was used to download the short reads for NCBI short read archive (SRA).
Using SRA-toolkit
Prepare the referance data
Indexing ****``` mkdir -p ~/workdir/hisat_align/hisatIndex cd ~/workdir/hisat_align/hisatIndex
ln -s ~/workdir/sample_data/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.fa
hisat2_extract_splice_sites.py ~/workdir/sample_data/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.gtf
hisat2_extract_splice_sites.py ~/workdir/sample_data/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.gtf > splicesites.tsv
less SGCF-000221885.1_E.coli_0104_H4_illumina_1.0_genomic.fna
ln -s ~/workdir/sample_data/ ln -s ~/workdir/sample_data/SGCF-000221885.1_E.coli_0104_H4_illumina_1.0_genomic.fna .
sudo apt_get install hisat2 sudo apt install hisat2
mkdir -p ~/workdir/hisat_align/hisatIndex && cd ~/workdir/hisat_align/hisatIndex hisat2_extract_splice_sites.py ~/workdir/sample_data/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.gtf > splicesites.tsv hisat2_extract_splice_sites.py hisat2_extract_splice hisat2 --help
(Troubleshooting)
sudo apt-get install bwa
mkdir -p ~/workdir/bwa_align/bwaIndex cd ~/workdir/bwa_align/bwaIndex
ln -s ~/workdir/sample_data/SGCF-000221885.1_E.coli_0104_H4_illumina_1.0_genomic.fna . bwa index -a bwtsw GCF-000221885.1_E.coli_0104_H4_illumina_1.0_genomic.fna
cd ~/workdir/bwa_align R1="$HOME/workdir/sample_data/SRR292678_pass_1.fastq.gz" cat $R1 ln -s ~/workdir/sample_data/GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.fa . bwa index -a bwtsw GCF_000221885.1_E.coli_0104_H4_Illumina_1.0_genomic.fa cd ~/workdir/bwa_align R1="$HOME/workdir/sample_data/SRR292678_pass_1.fastq.gz" R2="$HOME/workdir/sample_data/SRR292678_pass_2.fastq.gz"
# Sorting the BAM file
# Indexing the BAM file
in this point i stoped this project as i join to another group