Compared to metabolomics databases (MetaCyc, COG, KEGG, 3 more) using FAST - combined the rpkm values for all enzymes in pathway with set of rules (specific to each pathway)
Output was RPKM value for enriched pathways for every sample
Rpkm - After the contigs are assembled, the reads were aligned back to the contigs - this was used to generate rpkm (essentially a normalized genomic abundance metric - reads per kilobase of transcript per million mapped reads)
Sample name - c=non-viral (includes bacterial and viral), ERR=viral only
Type - single includes only one fraction - multi includes the viral and bacterial fraction data in a single analysis
Date information exists - Simon Rao
Cyanobacteria normally have fast turn-over - slow down and halt photosynthesis in response to viruses (sequester them and protect neighbouring cells) - virus carries genes that are part of the photosystem - overcomes the defence mechanism and promotes photosynthesis, cellular division
Pathway tools - KEGG Atlas - have diagrams for metabolism - recommended using these
Envisions this turning into a manuscript - Nature Scientific Data publication
Heatmap with distribution of pathways good starting point (something similar to KEGG atlas ideal though)
Want to be able to do things like compare samples in Indian Ocian to x Ocean
Pathways by location heatmap
Metaviriome - attracted to certain pathways - want to visualize the pathways that are affected
Tool PathoLogic uses to predict metabolic networks of organism with annotated genome files - generates Pathway/Genome databases - BioCyc stores the databases generated by SRI
Used to generate organism-specific pathway/genome databases
Curated from experimentally validated results/academic papers
Data points are plotted to the map with latitude and longitude values
Want to be able to query by location, depth, other metadata (temperature, salinity, etc)
Clicking on a sample should pull up data on sample information, pathway information, etc. (Want some figures to make data visual - likely by metabolic category) and link out to MetaCyc information
Analysis Functionality
Want differential comparison of metabolic pathway activation for samples with given set of characteristics
Want to have a way to filter out pathways that are generally present everywhere
Interactive KEGG Atlas-like Visualization (if time permits)
Website Progress
Elected to use shiny to build website
Set groundwork and set up the map using leaflet (package for maps)
Working on query module - if filtering, zooming, changing parameters, query returns subset of data
Began basic formatting of site
Current site functionality: The site has three pages - a Home/Welcome, Map, and Analysis page - accessible from the sidebar menu. The map appears on the map page and the points are plotted. When a given sample is selected, a field appears with some basic sample data. A Control panel has been added for the map, but it is not yet filled in or functional
RPKM: Reads Per Kilobase of transcript per Million mapped reads
SRF: Surface Water Layer
DCM: Deep Chlorophyll Maximum
MES: Mesopelagic
OMZ: Oxygen minimum zone
Friday, October 12
Project Background
Prior Sample/Data Processing
Data Contents
Notes From Steve Hallam Visit
MetaCyc Notes
Project Update
Approximate Task Divvy Up
http://www.gutcyc.org - similar initiative - gut microbiome
Website Goals
Website Progress