graph-genome / component_segmentation

Read in ODGI Bin output and identify co-linear components
Apache License 2.0
3 stars 4 forks source link

Find Annotation and Knowledge Graphs to integrate #41

Open josiahseaman opened 4 years ago

josiahseaman commented 4 years ago

Assignee: Ali Haider Bangash The first step is to identify what data could be integrated through a knowledge graph and what is available. What did the other Hackathon teams accomplish? What is available? Information goes in this issue. We're looking for information that relates to genetic variants of the virus:

hhaider15 commented 4 years ago
hhaider15 commented 4 years ago

Structural annotations: Very well done by Machine learning working group- Complete genomes of the strains: labelled with the respective source & its metadata

hhaider15 commented 4 years ago

Gene annotations: whole genome nucleotide data pulled from RVDB release 14 as labels. Metadata for human & non-human pathogen phenotypes

hhaider15 commented 4 years ago

Structure annotations: Amino acid sequence data for common cold CoV and SARS-COV-2 for M, E & S proteins with metadata

hhaider15 commented 4 years ago

Genes & structural annotations: Proteomics data & MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710 Proteome and Translatome of SARS-CoV-2 infected cells

subwaystation commented 4 years ago

Hi @hhaider15 ! Thanks for all the links. We could work with e.g. .csv or .fasta.

But what we had in mind are SparqlEndpoints which we could query using SPARQL.

I think a good start would be http://yummydata.org/. And maybe you will finde some endpoints which are not listed there ;) Please come back to me, if you have more questions.

subwaystation commented 4 years ago

@josiahseaman and Phylogenetics: As far as I got it from the #public_sequence_resource group, they will pack the metadata also into a SPARQL endpoint. Part of the metadata will be a mandatory field for collection_location. For the list of the required metadata please visit https://github.com/arvados/bh20-seq-resource/blob/master/example/minimal_example.yaml.

innamoratika commented 4 years ago

Ali- Just wanted to introduce myself post-convo with @josiahseaman : I'll be working on the phylo side of things and we should touch base at some point regarding using universal IDs for genomes. We should have enough in the phylo tree that we can track provenance and pass that on to you!

hhaider15 commented 4 years ago

Agreed. Apologies I was busy earlier. Shall be working on this, now.

hhaider15 commented 4 years ago

Good to see you @innamoratika