Closed bwalsh closed 6 years ago
@bwalsh That is my opinion. Happy to have @ahwagner @malachig and others chime in as well.
@jgoecks
Made the change - brca now has 17,584 items ( was 5,715 ) One note: unreviewed items do not have a phenotype
+++ b/harvester/brca.py
@@ -22,10 +22,10 @@ def harvest(genes=None):
else:
page_num = page_num + 1
for record in payload['data']:
- if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
- gene = record['Gene_Symbol']
- gene_data = {'gene': gene, 'brca': record}
- yield gene_data
+ # if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
+ gene = record['Gene_Symbol']
+ gene_data = {'gene': gene, 'brca': record}
+ yield gene_data
@bwalsh I get slightly different results when I run this. source:brca: 17,546 source:brca AND exists:association.phenotype.description: 5,791
Whereas g2p-test shows a slightly higher overall result from BRCA and a slightly lower count with phenotype association. Is this just related to when the harvest was run do you think?
Code looks good, just confused about number variation.
I'll check g2p-test tomorrow ( there were snafus uploading to it )
Per the group discussion today, we're reversing course on this and should exclude "Not Yet Reviewed" variants.
@ahwagner @mayfielg @jgoecks for your review... addressed and deployed at https://g2p-test.ddns.net
The possible values:
"Benign / Little Clinical Significance" "Likely benign" "Not Yet Reviewed" "Pathogenic" "Uncertain"
We currently filter out "Not Yet Reviewed".
We discussed removing that filter, can you confirm?