Closed sr320 closed 5 years ago
BEDTools allow the user to directly work with data in BED, GTF and GFF filetypes. The five BEDTools command most relevant to me:
intersect
allows me to find overlapping regions between two separate range files. I can identify overlaps between DML and other genome features, like exons.closest
will find the nearest genomic feature, but not necessarily a non-overlapping feature. I can find the closest feature to DML and DMR that could be putative promoter regionsflank
can add specified base pair regions to the beginning and end of a genomic feature. I can add 100 bp to the beginning and end of my mRNA coding regions and find intersections with those flanks and DML or DMRmerge
would take overlapping ranges (possibly identified in intersect) and merge them into a single range.Bedtools allow you to perform genomic analysis on multiple files in a variety of formats (BAM, BED, GFF/GTF, VCF). Some of the most useful commands or subcommands that could be relevant to me:
intersect
would be the most useful. It would work for my fish data, when I'm checking for overlap regions on specific genes.
merge
I could use for the first step in my eDNA analysis pipeline
genomecove
looks like an interesting command that summarizes coverage per chromosome or for entire genome.
getfasta
works to extract specific sequences out of much larger ones, like an entire chromosome.
multicov
For files in the BAN format, it counts #of alignments in multiple files that overlap a specified one.
Though I won't be using bedtools for my proteomics analysis, here are 5 sub-commands:
intersect tells how much overlap occurs in two ranges getfasta extracts subsets of genome genomecov summarizes coverage over a genome, or chromosome multicov counts how many alignments from a number of BAM files overlap with a number of BED files merge pieces together overlapping ranges
genomecov calculates the level of coverage over a whole genome
intersect finds overlaps
random creates random intervals in a genome
multicov counts coverage at a certain site in multiple BAM files
getfasta use a fasta file to extract part of a genome
I could imagine myself using the BEDtools in the following ways:
genomecov
- to determine whether the I have uniformity of sequencing across a genome during WGS
intersect
- to find antibiotic genes in a reference genome
jaccard
- to compare the genomes of geographically isolated bacterial populations
bamtobed
- to facilitate switching between samtools and bedtools
getfasta
- extract sequence information from overlaps to BLAST against well-annotated bacterial genomes
intersect
--> finds overlaps between ranges
annotate
--> finds how much coverage each file has over another input file
getfasta
--> pull out sequences for a given set of ranges
multiliner
--> finds overlaps of a given feature between files
merge
--> merges overlapping ranges into one range
I will not be using BedTools, but the ones I found interesting are:
flank - will create a portion of basepairs on each side of an indicated sequence fisher - a fisher's exact test to see the similarities and differences between two files groupby - creates summary statistics based on groups in an indicated column of data overlap - shows the overlap or distance between features in a file subtract - searches for features in file 2 that overlap with features in file 1. Overlapping features found in file 2 are removed from file 1
I will likely not be using BedTools for my current project.
intersect
- finds the overlapping regions between two range files
slop
- adds basepairs to the ends of ranges in a singe .bed file
genomecov
- summarizes feature coverage, providing depth, bases covered, chromosome bases, and proportion of bases covered
annotate
- shows how much coverages one file has over another
multicov
- counts alignments in many BED files to a BAM alignment file
I would use BedTools for these commands:
genomecov
: get depth information at each genome position using the -d flag
intersect
: find overlap between the sequence alignments and genes
flank
: add a specific number of bases before or after a given range
getfasta
: extract sequences in fasta format for a given range
sort
: sort by chromosome or scaffold size or by score
I won't be using bedtools in my project but 5 interesting commands are:
intersect
- finds overlaps between ranges (which can also return all non-overlapping ranges with the -v
option)
slop
- grows a range by a specified number of basepairs,, can expand only left or right side with the options -l -r
flank
- finds flanking regions of ranges, particularly useful for finding promoter regions of genes
genomecov
- summarizes genome coverage over the whole genome
merge
- merges overlapping ranges into one range
I won't be using bedtools
for my metabarcoding projects, but the following subcommands are very interesting to me:
intersect
compute the overlaps between two sets of ranges
slop
grow ranges
flank
extract flanking ranges
genomecov
summarize the coverage of features along chromosome sequences
merge
merge overlapping ranges into a single range
intersect
- can be used to extract overlapping regions between .bed files. Adding -wb -wa
flags returns the entire ranges (not just the overlapping ranges), not just the overlapping ones. -s
specifies that features must be on the same strand.
genomecov
- summarizes coverage of features along chromosome sequences, helpful to see % coverage stats.
merge
- merges overlapping ranges into a single range, basically removing duplicated data.
annotate
- annotates coverage of one track file against another. Could be used to look at
unionbedg
-merges multiple BedGraph files into one file, seems very useful.
What do you consider the five most relevant BedTools commands based on your research interest?
Please list them and indicate what each one does.