Open josiahseaman opened 4 years ago
Structural annotations: Very well done by Machine learning working group- Complete genomes of the strains: labelled with the respective source & its metadata
Structure annotations: Amino acid sequence data for common cold CoV and SARS-COV-2 for M, E & S proteins with metadata
Genes & structural annotations: Proteomics data & MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710 Proteome and Translatome of SARS-CoV-2 infected cells
Hi @hhaider15 !
Thanks for all the links. We could work with e.g. .csv
or .fasta
.
But what we had in mind are SparqlEndpoints which we could query using SPARQL.
I think a good start would be http://yummydata.org/. And maybe you will finde some endpoints which are not listed there ;) Please come back to me, if you have more questions.
@josiahseaman and Phylogenetics: As far as I got it from the #public_sequence_resource group, they will pack the metadata also into a SPARQL endpoint. Part of the metadata will be a mandatory field for collection_location
. For the list of the required metadata please visit https://github.com/arvados/bh20-seq-resource/blob/master/example/minimal_example.yaml.
Ali- Just wanted to introduce myself post-convo with @josiahseaman : I'll be working on the phylo side of things and we should touch base at some point regarding using universal IDs for genomes. We should have enough in the phylo tree that we can track provenance and pass that on to you!
Agreed. Apologies I was busy earlier. Shall be working on this, now.
Good to see you @innamoratika
Assignee: Ali Haider Bangash The first step is to identify what data could be integrated through a knowledge graph and what is available. What did the other Hackathon teams accomplish? What is available? Information goes in this issue. We're looking for information that relates to genetic variants of the virus: