The ontology currently has a significant shortcoming: the most common alleles
(the wild type alleles, usually denoted by *1 and being listed in the first row
of the haplotype definition spreadsheet) have no tagging SNPs, so they are
never inferred from patient data. This needs to be changed.
I suggest doing the following:
1) Generate a list of SNPs tested by 23andMe V2 for each gene. This list can be
generated by running the following SPARQL query over the ontology.
PREFIX cds: <http://www.genomic-cds.org/ont/genomic-cds.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?allele ?polymorphism {
?polymorphism rdfs:subClassOf cds:polymorphism .
?polymorphism cds:can_be_tested_with cds:23andMe_v2 .
?polymorphism cds:relevant_for ?allele .
}
ORDER BY ?allele
2) Modify the haplotype definition spreadsheet. For each wildtype allele (in
the first row), turn all SNPs that are tested by 23andMe V2 into tagging SNPs
by adding " [tag]" to the respective cells.
3) After doing this, run the script for generating the ontology. See if there
are any inconsistencies (because of overlapping allele/haplotype definitions
introduced in step 2). If there are any such inconsistencies, try to fix them
by turning more SNPs in the wildtype allele into tagging SNPs (*only* modify
wildtype allele definitions, keep all other rows untouched).
After doing this, I expect that far more alleles and CDS rules will match for
each patient.
Original issue reported on code.google.com by matthias...@gmail.com on 11 Aug 2013 at 11:30
Original issue reported on code.google.com by
matthias...@gmail.com
on 11 Aug 2013 at 11:30