sr320 / course-fish546-2021

1 stars 1 forks source link

Bringing it together #59

Closed sr320 closed 3 years ago

sr320 commented 3 years ago

How could you use alignment data (and samtools) in your own project?

Provide a least one example of what said code might look like.

skreling commented 3 years ago

For 1, I could use samtools to check the quality of the alignment and to visualize the aligned sample next to the reference genome with samtools view.

To check the quality readings of the alignment you can simply use: samtools view samplefile.sam | tr '\t' '\n' | head -n 11 This will give use a simple overview of the entries in the SAM file and allow us to see it within a single view without it stretching off the width of the page. We can then look at the CIGAR string and MAPQ mapping quality sections to understand a bit more of about our alignments.

Secondly, once you get an indexed bam file, you can view it next to the reference genome using: samtools tview sample.bam reference.fasta You can also view a specific region by: samtools view -p 1:regionstart-regionend sample.bam \ reference.fasta

aspencoyle commented 3 years ago

One possible option is once I've filtered my sequences to only include those that originate from Hematodinium, I could use samtools mpileup to look for SNPs. Since to my knowledge we only have a transcriptome for Hematodinium, I might have to limit the examination to synonymous SNPs - without a genome, it seems impossible to differentiate between non-synonymous SNPS and different genes.

This could provide some indication of diversity between Hematodinium strains within infected crabs. It would likely be quite low - all samples come from a small area in SE Alaska, and previous studies found little population structure for Hematodinium on a global scale. But still could be worth investigating! I'm not sure if

The command would look quite similar to the one from the other quiz - I just likely wouldn't specify region

samtools mpileup \
--no-BAQ \
--fasta-ref filename.fasta \
filename.bam
jdduprey commented 3 years ago

For metabarcoding my lab has compared sample ASVs against a database of known cytochrome oxidase I fragments. To supplement this assignment of taxa they have also used a custom database along with bowtie2. I think this pipeline would output alignment data that could be explored with Samtools. For example:

samtools flags UNMAP This would indicate the bit flag for unmapped reads

samtools view -F 4 alignment-file.bam | head -n 3 This would select the unmapped reads from the alignment data file. In this case unmapped reads would be eDNA fragments with no assigned taxa.

laurel-nave-powers commented 3 years ago

I can use alignment data and samtools in my project to find SNPs using samtools mpileup -no-BAQ -region ####-#### -fasta-ref name.fasta name.bam. I can also use this to compare my reference sequence data to it. To be able to view the SNPs I can use samtools tview.

meganewing commented 3 years ago

Since I'm looking at changes in gene expression, I'll need to compare my experimental sequences to a control reference sequence and identify variants (once I get my sequences aligned, that is). But first I want to narrow my search down to sequences with qualities greater than 30:

samtools view -q 30 -b AllQualities.bam > QualitySeqs.bam Now all of my quality sequences are in a file. Then, I can look for variants within specific regions of my quality sequences with samtools mpileup --no-BAQ --region <#:BP###-BP###> --fasta-ref <referenceGenome>.fasta sequenceVariants.bam

Brybrio commented 3 years ago

I will attempt to find SNPs but my datasets are large, so I can subset my data to run initial trial processes more efficiently using samtools index input.bam and then samtools view input.bam 1:###-###. I can also filter my data specifying flags such as "PROPER-PAIR" or "PAIRED" in samtools view -f 66 input.bam | head -n #