Closed ireneisdoomed closed 5 months ago
I am not able to reproduce the issue today, so it must have been an error of mine.
The data has been regenerated and the locus for the second association is no longer empty, plus the find_overlaps
behaviour is good.
I've looked at the reported pair and the overlapping variant is common in both loci, and statistics are appropriately assigned.
Also checked another one 5429145510817404460
/5163827416381512276
:
5429145510817404460
with 36 variants in the locus5163827416381512276
with 34 variants in the locusThe overlaps for these contain 36 variants, 34 of them are common to both so they have statistics at both sides, and 2 of them only in one.
I've added a semantic test to StudyLocus.find_overlaps
to make sure we identify changes in the logic from now on. (https://github.com/opentargets/genetics_etl_python/pull/407)
Describe the bug There are two significant issues in the dataset related to the handling of overlapping associations and statistical probability calculations. Specifically, the dataset is incorrectly identifying overlapping associations where no common variant exists in the locus, and it is erroneously assigning a
right_posteriorProbability
value of 1.0 to a variant not present in the locus ofrightStudyLocusId
.Observed behaviour I have looked at the credible set dataset and focused on these 2 associations:
In [35]: subset_cs.df.filter(f.col("studyLocusId") == -3309244191177514893).select("locus").show(truncate=False) +-----+
|locus| +-----+ +-----+
In [37]: subset_cs.find_overlaps(studies).df.filter(f.col("tagVariantId") == "1_156645322_A_G").show(truncate=False) 24/01/11 12:00:01 WARN CacheManager: Asked to cache already cached data. 24/01/11 12:00:01 WARN CacheManager: Asked to cache already cached data. +------------------+--------------------+----------+---------------+----------------------------------------------------------+ |leftStudyLocusId |rightStudyLocusId |chromosome|tagVariantId |statistics | +------------------+--------------------+----------+---------------+----------------------------------------------------------+ |512473136292631516|-3309244191177514893|1 |1_156645322_A_G|{null, 1.0, null, null, null, null, 1.0, null, null, null}| +------------------+--------------------+----------+---------------+----------------------------------------------------------+