Open mbrush opened 7 years ago
Also, consider that most referenced pubs for SCVs that are not 'literature only' or 'curation' do not contain evidence - but rather point to things like methods used or documentation of guidelines. e.g. for SCV000267702. So consider not linking these as supporting references for an evidence line.
I see a lot of clinvar tickets open, is this one still relevant?
Is it a fair summation to say that the XML poses us a lot of issues?
Note the macarthur lab used to maintain a parser for clinvar XML:
https://github.com/macarthur-lab/clinvar
They now say this is not supported as the clinvar VCF provides everything
Would it make sense for us to switch to ingest the VCF? VCF is standard and there are many other areas where it would be useful to have a VCF->kg/rdf mapping.
It may make sense to think of what our end goal is here and to work backwards to the most maintainable strategy
First pass ingest of ClinVar-XML (#276)) only pulled minimal evidence and provenance information.
As SEPIO matures, we will soon be at a point where we can revisit this source to pull more of this info. A couple specific things to address:
Note also that I suspect that ClinVar may be changing its data model given recent activity in the ClinGen community - so given the low immediate value of evidence metadata we collect from ClinVar, it may be best to just make the easy fixes to 1 and 2 above, and not spend time parsing out additional metadata as in 3.