Bringing it together - Githubissues

sr320 commented 5 years ago

How could you use alignment data (and samtools) as well as bedtools in your own project?

Provide a least one example of what said code might look like.

yaaminiv commented 5 years ago

My entire project is based around alignment data and bedtools. I am using BAM files (MBD-BSseq data I aligned to the C. virginica genome) to identify DML and DMR. The DML and DMR are then written out to a BEDfile which I can intersect with genomic feature files (ex. exons, introns, mRNA coding regions, transposable elements) to characterize the location of DML and DMR:

! {bedtoolsDirectory}intersectBed \ #Path to betools. {bedtoolsDirectory} was set as a variable path earlier in the script
-wb \ #Write the original entry for b in for each overlap
-a {DMRlist} \ #Variable path set earlier in the script
-b {exonList} \ #Variable path set earlier in the script
> 2018-11-07-DMR-Exon.txt #Redirect output to a new file

hgloiselle commented 5 years ago

I would use SAMtools sort and mpileup with alignment data in order to call SNPs. Using this SNP information, I could then use bedtools intersect in order to see if there was any overlap between the SNPs and genes. Very, very rough example of what code down the line may look like: bedtools intersect -a {sample1snps} -b {gene coordinates}

Jeremyfishb commented 5 years ago

Let's say I have a couple SAM files that I want to look for overlaps in. I could first convert them to BAM files, which bedtools prefers, using samtools like this:

samtools view -S -b somefile.sam > somefile.bam

where * could indicate 1 and 2

then use bedtools like this:

bedtools intersect -a somefile1.bam -b somefile2.bam

kcribari commented 5 years ago

I am not using alignment to a reference genome or BedTools for my project, however, if I had BAM alignment files, I would use the following code to view them.

samtools sort somefile_unsorted.bam somefile_sorted

this will sort my alignments by their position on the genome bedtools bamtobed -i <somefile_sorted.bam>
this will convert my bam file to a bed file bedtools groupby -i somefile_sorted.bed g-1-4 -c 9 -o count
this will count the number of lines for each group in the sorted bed file

wsano16 commented 5 years ago

When working with microbiome data, it is important to remove host sequences as contamination before data analysis. While this is not an issue with 16s data (because only bacteria and archaea have a 16s ribosomal subunit), it could be an issue when working with other marker genes or WGS data. To remove contaminant sequences, I could align my sequences to the genome of the host. All those sequences that fail to align would necessarily be non-host sequences.

To accomplish this, I would first establish an index file using a host reference genome using bowtie bowtie2-build host_genome.fna host_index Then I would map my reads to index file bowtie2 -x host_index -1 R1.fastq -2 R2.fastq -S results.sam

Convert to bam in samtools samtools view -bS results.sam > results.bam Extract unmapped sequences samtools view -b -f 12 results.bam > results.bam (the -f 12 flag removes unmapped sequences) Sort sequences samtools sort -n results.bam results_sorted.bam Export as .fastq with bedtools bedtools bamtofastq -i results_sorted.bam -fq results_sorted_r1.fastq -fq2 results_sorted_r2.fastq

calderatta commented 5 years ago

I have not yet aligned my sequences, but I think examining them in SAM format would be helpful. If they were in BAM, I could convert them using: samtools view -h [filename].bam > [filename].sam Alternatively I could do something with the range files by first converting from BAM: bedtools bamtobed -i [filename].bam Such as see where the ranges overlap between two files: bedtools intersect -a [filename-a].bed -b [filename-b].bed

jgardn92 commented 5 years ago

I won't be aligning sequences to a genome but if I had SAM alignment files I could look at genome coverage as follows: First convert SAM file to BAM samtools view alignment.sam > alignment.bam Use bedtools to summarize coverage (bedtools genomecov can work with BAM files directly) bedtools genomecov -i alignment.bam -g genomesummary.txt

kimh11 commented 5 years ago

I would use samtools and BedTools to find variants from the BAM files and extract the 80 bp sequence around the SNP as fasta file.

Use samtools to index my reference genome: samtools faidx reference.fa Sort the BAM files: samtools sort indv1.bam indv1-sorted.bam Find variants: samtools mpileup -v -f my.fasta indv1-sorted.bam > indv1.vcf.gz bcftools call -v -m indv1.vcf.gz > indv1-calls.vcf.gz

Use bedtools to get flanking sequences: bedtools flank -i indv1-calls.vcf.gz -g reference.fa -b 40 > snp-flank.bed Extract fasta sequences of the SNP flanking regions: bedtools getfasta -fi reference.fa -bed snp-flank.bed

melodysyue commented 5 years ago

I won't align my sequences against the reference genome, so I won't be using samtools and bedtools. But if I need to find SNP variants in the future, I will:

Index reference genome to allow for faster random access of the sequence at specific locations. samtools faidx reference.fa
Sort ( by alignment position) and index a BAM file to allow for fast random access to reads aligned within a certain region
```
samtools sort unsorted.bam sorted
samtools index sorted.bam
```

Variant calling using base alignment quality algorithm

samtools mpileup -v -fasta-ref reference.fasta.fai sorted.bam.fai > vcf.gz
bcftools call -v -m vcf.gz > calls.vcf.gz

laurahspencer commented 5 years ago

I used bowtie to align my TagSeq reads to the the denovo-assembled transcriptome, producing a SAM file for each TagSeq file:

bowtie2 --local -x transcriptome.fasta -U sample1.fastq.trim -S sample1.fastq.trim.sam --no-hd --no-sq --no-unal -k 5

I can then use samtools view -b and samtools sort to convert the SAM files to BAM files, and sort by alignment position. I may also want to extract just the reads that had high quality alignment, which I can do with bitwise flags, but I'll need to assess further. Then I believe I can derive counts using samtools mpileup but I haven't gotten there yet ...

grace-ac commented 5 years ago

There is not a C. bairdi genome, but if there was, I could look at specific regions:

convert SAM to BAM samtools view -b name.sam > name.bam
Sort alignments by position samtools sort name.bam name
Index alignments samtools index name.bam
Look at specific alignments samtools view name.bam [chromosome region of interest] | head -n 3

sr320 / course-fish546-2018

Bringing it together #62