Thanks for releasing this great resources. I noticed some discrepancies between the semicolon-separated lists in clinical_significance_ordered and submitters_ordered:
In [1]: df = pd.read_csv('clinvar_alleles_example_750_rows.single.b37.tsv', sep='\t')
In [2]: df.shape
Out[2]: (749, 39)
In [3]: for col in 'rcv scv clinical_significance_ordered submitters_ordered'.split():
...: df['len_' + col] = df[col].apply(lambda x: len(x.split(';')))
In [4]: diffs = df[df.len_clinical_significance_ordered != df.len_submitters_ordered].shape
In [5]: diffs.shape
Out[5]: (120, 43)
Ordered clinical significance doesn't seem to match the RCV or SCV lists either. Is this intended?
Hi,
Thanks for releasing this great resources. I noticed some discrepancies between the semicolon-separated lists in
clinical_significance_ordered
andsubmitters_ordered
:Ordered clinical significance doesn't seem to match the RCV or SCV lists either. Is this intended?
Thanks