The result in your clinvar_alleles.tsv:
clinical_significance="Likely benign;Uncertain significance"
all_submitters="Genetic Services Laboratory, University of Chicago;PreventionGenetics"
That's just one example, there are many, many, many more.
I can see where this comes from. Regex and the XML structure. In script: parse_clinvar_xml.py:104
current_row['all_submitters'] = ';'.join([
submitter_node.attrib['submitter'].replace(';', ',')
for submitter_node in elem.findall('.//ClinVarSubmissionID')
if submitter_node.attrib is not None and submitter_node.attrib.has_key('submitter')
])
The "submitters" is obtained from a separate node, without any attempt to match against the nested clin_sig description.
clinical_significance=elem.find('.//ReferenceClinVarAssertion/ClinicalSignificance')
if clinical_significance.find('.//ReviewStatus') is not None:
current_row['review_status']=clinical_significance.find('.//ReviewStatus').text;
if clinical_significance.find('.//Description') is not None:
current_row['clinical_significance']=clinical_significance.find('.//Description').text
If I had a solution worked out, I would make a pull request. But it appears to tricky, so far.
I created a pull request for this. Not replacing your columns, but simply adding 2 new columns to report back the submitter specific clin. sig. & review status. #28
Example: Variant=chr1:976059_C>T ID=RCV000195231
The result in your clinvar_alleles.tsv: clinical_significance="Likely benign;Uncertain significance" all_submitters="Genetic Services Laboratory, University of Chicago;PreventionGenetics"
If you look at the order of the list....which would be useful....Likely benign was submitted by U of Chicago. But, that is not the case: https://www.ncbi.nlm.nih.gov/clinvar/RCV000195231/
That's just one example, there are many, many, many more.
I can see where this comes from. Regex and the XML structure. In script: parse_clinvar_xml.py:104
The "submitters" is obtained from a separate node, without any attempt to match against the nested clin_sig description.
If I had a solution worked out, I would make a pull request. But it appears to tricky, so far.