DNA sequences - Githubissues

hiscom / hispid

HISPID Terms

6 stars 1 forks source link

DNA sequences #32

Closed nielsklazenga closed 8 years ago

nielsklazenga commented 9 years ago

MEL would like to deliver DNA sequences – or rather links to DNA sequences – to AVH. Our requirements are not high: the Darwin Core associatedSequences is all we need.

nielsklazenga commented 8 years ago

What happened with this?

AaronWilton commented 8 years ago

I think we propose the field, definition etc in here and get it ratified at the teleconference

nielsklazenga commented 8 years ago

Okay, first test case for the procedure we set out.

nielsklazenga commented 8 years ago

<rdf:Description rdf:about="http://rs.tdwg.org/dwc/terms/associatedSequences">
  <rdfs:label xml:lang="en">Associated Sequences</rdfs:label>
  <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
  <skos:definition xml:lang="en">A list (concatenated and separated) of identifiers 
    (publication, global unique identifier, URI) of genetic sequence information 
    associated with the Occurrence.</skos:definition>
  <dwcattributes:status>current</dwcattributes:status>
  <rdfs:isDefinedBy rdf:resource="http://rs.tdwg.org/dwc/terms/"/>
  <skos:example>http://www.ncbi.nlm.nih.gov/nuccore/KJ599123 | 
    http://www.ncbi.nlm.nih.gov/nuccore/KJ599234 | 
    http://www.ncbi.nlm.nih.gov/nuccore/KJ599349 | 
    http://www.ncbi.nlm.nih.gov/nuccore/KJ598898 | 
    http://www.ncbi.nlm.nih.gov/nuccore/KJ599012</skos:example>
</rdf:Description>

AaronWilton commented 8 years ago

add property:

<dwcattributes:organizedInClass rdf:resource="http://rs.tdwg.org/dwc/terms/Occurrence"/>

AaronWilton commented 8 years ago

Teleconference 2016-02-12: Adopted

ainsleyongit commented 8 years ago

My understanding is the adopted use of associatedSequences already allows for many-to-many relationships between Occurrence and DNA Sequence but as we move away from the relatively simple 'one sequence of one locus from one individual' kind of data it will become commonplace to generate, and publish, a lot more molecular data like genomes comprised of multiple individuals and next generation sequencing runs comprised of multiple organisms. The associated molecular data space might become increasingly messy.

AaronWilton commented 8 years ago

yes, but don't see that as a problem. These are often grouped into experiments or other constructs that can also be referenced by a URI... I would see no issue with adding them as long as they are referring to this specimen.

Also, I think we are in a much better space with HISPID so it will be easy to add additional properties to it to handle some of the more complex cases as the need arises.

One of the things that does concern me a little is when we get whole genome sequencing of an individual - we have fungal examples where this has (apparently) resulted in 1000's of sequences being deposited for a single specimen. For our internal systems we have proposed that we would link to the project/experiment or equivalent in this context, rather than all the individual sequences.

AaronWilton commented 8 years ago

not additional comments, is implemented in rdf, closing issue