Short title: Comparative Genomics with DECIPHER and SynExtend
Workshop URL:https://github.com/ahl27/CompGenomicsBioc2022Workshop docker image: ghcr.io/ahl27/compgenomicsbioc2022:latest
Workshop port: 8787
Workshop memory request: 4GB
Workshop description:
In the past decade, the number of sequenced proteins with unknown functions has grown exponentially, while the number of experimentally analyzed proteins has increased at a relatively constant rate. To assign functions to more proteins, many in silico methods have been produced to predict protein function purely from gene sequence data. These methods rely on a ‘guilt-by-association’ analysis to detect genes that are likely involved in common functional pathways.
We have developed two Bioconductor packages with many functions for comparative genomics. This workshop will highlight a broad spectrum of tools for comparative genomics through the lens of a specific application: predicting the functional association between proteins to formulate hypotheses about the unknown role of new proteins. Along the way, we will cover several topics:
Finding and importing genomes into a sequence database with the DECIPHER package
Gene calling and annotation using the DECIPHER package
Identification of clusters of orthologous genes using the SynExtend package
Construction of alignments and phylogenetic trees using the new TreeLine function in the DECIPHER package
Prediction of functional associations using ProtWeaver in the SynExtend package
This talk will specifically focus on new features to DECIPHER and SynExtend: TreeLine() for phylogenetic reconstruction, and ProtWeaver for guilt-by-association analysis. All examples will be performed on a real dataset of Mycoplasma genomes.
At the conclusion of this workshop, participants will have gained expertise in a wide variety of tools for bioinformatic analyses, including sequence alignment, phylogenetic reconstruction, ortholog detection, functional annotation, and detection/quantification of selective pressure. These packages are optimized for application at scale, allowing users to conduct analyses on large datasets.
Add any additional notes below.
Webpage is located at https://www.ahl27.com/CompGenomicsBioc2022/articles/CompGenomicsBioc2022.html. Sections are broken into individual pages because I thought this was a cleaner flow than one super long Rmd page. Each page highlights the topics that will be covered for that section; full tutorials will be written prior to the June 17 deadline.
2GB of RAM should be sufficient, but this may change when I finish and test the tutorials.
A Setup page is included to future proof the tutorial so it can be accessed outside of the workshop timeframe--this page will be skipped during the workshop itself.
Please supply the following information:
Short title: Comparative Genomics with DECIPHER and SynExtend Workshop URL: https://github.com/ahl27/CompGenomicsBioc2022 Workshop docker image: ghcr.io/ahl27/compgenomicsbioc2022:latest Workshop port: 8787 Workshop memory request: 4GB Workshop description:
In the past decade, the number of sequenced proteins with unknown functions has grown exponentially, while the number of experimentally analyzed proteins has increased at a relatively constant rate. To assign functions to more proteins, many in silico methods have been produced to predict protein function purely from gene sequence data. These methods rely on a ‘guilt-by-association’ analysis to detect genes that are likely involved in common functional pathways.
We have developed two Bioconductor packages with many functions for comparative genomics. This workshop will highlight a broad spectrum of tools for comparative genomics through the lens of a specific application: predicting the functional association between proteins to formulate hypotheses about the unknown role of new proteins. Along the way, we will cover several topics:
DECIPHER
packageDECIPHER
packageSynExtend
packageTreeLine
function in theDECIPHER
packageProtWeaver
in theSynExtend
packageThis talk will specifically focus on new features to
DECIPHER
andSynExtend
:TreeLine()
for phylogenetic reconstruction, andProtWeaver
for guilt-by-association analysis. All examples will be performed on a real dataset of Mycoplasma genomes.At the conclusion of this workshop, participants will have gained expertise in a wide variety of tools for bioinformatic analyses, including sequence alignment, phylogenetic reconstruction, ortholog detection, functional annotation, and detection/quantification of selective pressure. These packages are optimized for application at scale, allowing users to conduct analyses on large datasets.
Add any additional notes below.
Webpage is located at https://www.ahl27.com/CompGenomicsBioc2022/articles/CompGenomicsBioc2022.html. Sections are broken into individual pages because I thought this was a cleaner flow than one super long Rmd page. Each page highlights the topics that will be covered for that section; full tutorials will be written prior to the June 17 deadline.
2GB of RAM should be sufficient, but this may change when I finish and test the tutorials.
A Setup page is included to future proof the tutorial so it can be accessed outside of the workshop timeframe--this page will be skipped during the workshop itself.