Open javild opened 5 years ago
Download looks okay.
2019-10-08 13:01:21 [main] INFO DownloadCommandExecutor:542 - Downloading gnomAD data...
2019-10-08 13:01:21 [main] DEBUG EtlCommons:98 - Executing command: wget --tries=10 https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz -O /tmp/homo_sapiens_grch37/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz -o /tmp/homo_sapiens_grch37/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz.log
2019-10-08 13:01:23 [main] INFO DownloadCommandExecutor:1192 - /tmp/homo_sapiens_grch37/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz created OK
Will the build be able to handle the weirdo bgz
file extension?
https://gnomad.broadinstitute.org/downloads
pLoF Metrics by Transcript TSV
pLoF Metrics by Gene TSV
data looks to be present in gene.json.gz:
**"pvalue":2.4564667E-7},{"geneName":"ENSG00000168032","experimentalFactor":"organism_part","factorValue":"trachea","experimentId":"E-MTAB-25","technologyPlatform":"A-AFFY-
33","expression":"UP","pvalue":0.036223516}],"constraints":
[{"source":"gnomAD","method":"pLoF","name":"oe_mis","value":0.98866},
{"source":"gnomAD","method":"pLoF","name":"oe_syn","value":0.88526},
{"source":"gnomAD","method":"pLoF","name":"oe_lof","value":1.1728}]}}**
from mongo:
for the non-canonical transcript
"annotation" : {
"constraints" : [
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_mis",
"value" : 1.0187
},
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_syn",
"value" : 1.0252
},
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_lof",
"value" : 0.73739
}
]
for the gene (and canonical transcript):
"constraints" : [
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_mis",
"value" : 1.0141
},
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_syn",
"value" : 1.0299
},
{
"source" : "gnomAD",
"method" : "pLoF",
"name" : "oe_lof",
"value" : 0.78457
}
Which match the numbers in the text file.
@imedina for the ExAC scores this is what's available:
exac_pLI | exac_obs_lof | exac_exp_lof | exac_oe_lof
I will load exac_oe_lof
, do you want exac_pLI
too?
And do we want them in the same Constraints array?
I added a unit test that compares the list of Constraints created by gnomAD using JUnit's assertEquals
. This works fine if I add Constraint.equals()
to biodata
. I noticed that we overrode toString
but not equals
.
Would it be okay to add equals
?
Otherwise, I can add a comparator in the unit test that iterates through the lists and compares the fields in each object.
Download no longer works:
2019-10-28 14:32:03 [main] WARN DownloadCommandExecutor:1013 -
https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint//tmp/homo_sapiens_grch37
/gene/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz cannot be downloaded
ExAC pLI scores have been replaced by pLoF (oe) scores in gnomAD, you can find more info at https://macarthurlab.org/2018/10/17/gnomad-v2-1/
With gnomAD, we have shifted from using the probability of being loss-of-function intolerant (pLI) score developed with ExAC and now recommend using the observed / expected (oe) score. ... The change from pLI to oe was motivated mainly by its easier interpretation and its continuity across the spectrum of selection.
Tasks: