Open ndreey opened 1 year ago
With CONCOCT i was able to bin based on contig lengths of 1000 and 700. These generated different amount of bins. To get a perfect bin removal i removed all host reads using the gsb. Here we see how many reads are after filtering, starting with 1000c, 700c, and lastly the perfect bin removal. Maybe, we can see a trend of imporvment
[gfr152@mjolnirhead01fl bin_filter]$ cat bin_refs/1000c_bin/reads/06_filt_R1.fastq | grep "Chr" | wc -l
2897406
[gfr152@mjolnirhead01fl bin_filter]$ cat bin_refs/700c_bin/06_filt_R1.fastq | grep "Chr" | wc -l
1927245
[gfr152@mjolnirhead01fl bin_filter]$ cat bin_refs/max_filt/reads/06_filt_R1.fastq | grep "Chr" | wc -l
1149
Bin removal and evaluation
Assembly
The bins that are to be removed are to be concatenated and then used as a reference genome.
cat fasta_bins/*.fa > all_bins.fa
bowtie2-build --threads 6 all_bins.fa bin_index
samtools view -@ 6 -b -S ${hc_prefix}_map.sam > ${hc_prefix}_map.bam
samtools view -h -f 4 ${hc_prefix}_map.bam > ${hc_prefix}_filtered_map.bam
bedtools
to separate the R1 and R2 reads into separate FASTQ files.bedtools bamtofastq -i ${hc_prefix}_filtered_map.bam -fq output_R1.fastq -fq2 output_R2.fastq
I can then assemble and re-run the MABQ using the host-depleted reads. Originally,
095_trim_R1.fq.gz
had 4521544 reads. After host bin filtering, we now have 2995189 reads,We can clearly see that the reads from two different microbes are withheld (some reads have been taken, although).
Binning
I cannot use the new assembly as the contigs wont be able to map to the
gsb
. Therefore, I created a Python script to remove the bins from thegsa
using the Biopython packageSeqIO
.With the filtered
gsa
,bin_filtered_gsa.fasta.gz
i can runconcoct_run.sh
using the filtered readsoutput_R1.fastq
andoutput_R2.fastq
together with thebin_filtered_gsa.fasta.gz
.This will show if the removal of bins increased the binning. However, because only 7259 reads were removed... I think that multiple bin filtration steps will be needed to see a significant change