Closed ValWood closed 7 months ago
What priority is this issue?
Not super high because I think everything was fixed manually. But always good to do the QC items. Probably medium, but slot it in somewhere if it's quick
I've had a look at penetrance values that are currently used so I know what to check for.
We have:
NUMBER
NUMBER-NUMBER
>NUMBER
<NUMBER
~NUMBER
~NUMBER-NUMBER
Where NUMBER is mostly an integer but there are a few weird ones like 48.98
and <98.2
I think 48.98 and <98.2 are valid, penetrance is a percentage (although these are slightly weird, we should probably round them!).
I've implemented that check now. I'll make sure it's OK in the next nightly load.
although these are slightly weird, we should probably round them
There are only 25 penetrance values with 2 decimal places. If you think they are worth fixing I can sort out PMID:25210736 which is a PHAF file. The other 4 values are from Canto.
pmid | penetrance |
---|---|
PMID:11780129 | 0.61 |
PMID:25210736 | 64.00 |
PMID:25210736 | 28.11 |
PMID:25210736 | 61.19 |
PMID:25210736 | 39.57 |
PMID:25210736 | 48.98 |
PMID:25210736 | 31.30 |
PMID:25210736 | 63.69 |
PMID:25210736 | 36.98 |
PMID:25210736 | 38.94 |
PMID:25210736 | 52.76 |
PMID:25210736 | 48.05 |
PMID:25210736 | 48.03 |
PMID:25210736 | 45.64 |
PMID:25210736 | 44.38 |
PMID:25210736 | 54.82 |
PMID:25210736 | 39.86 |
PMID:25210736 | 21.09 |
PMID:25210736 | 38.76 |
PMID:25210736 | 56.49 |
PMID:25210736 | 43.65 |
PMID:25210736 | 56.29 |
PMID:25993311 | 99.02 |
PMID:35658118 | 9.32 |
PMID:9398669 | 99.56 |
I can sort out PMID:25210736 which is a PHAF file.
I got that wrong. There is a PHAF file for PMID:25210736 but that's not where those penetrance values are recorded. They are double mutants in this session: https://curation.pombase.org/pombe/curs/08c96f6f44e500f7/ro I'm happy to fix that session if you like.
I think it makes sense to fix that session.
I've rounded the values in the PMID:25210736 session to 1 decimal place. I was tempted to round them to 0 decimal places since they are quite approximate values. Let me if I should.
Yes I think so, the decimal places are a bit crazy here.
OK, I've round to 0 decimal places.
Hi @ValWood
Could you have a look at these?
pmid | value | session |
---|---|---|
PMID:11780129 | 0.61 | https://curation.pombase.org/pombe/curs/8ddb4e6192c755eb |
PMID:25993311 | 99.02 | https://curation.pombase.org/pombe/curs/536dc2e074eee139 |
PMID:35658118 | 9.32 | https://curation.pombase.org/pombe/curs/e1f4d0eca71f1467 |
PMID:9398669 | 99.56 | https://curation.pombase.org/pombe/curs/9ef4b740616f8870 |
for this one has_penetrance 0.61% I will make 0.6 but we need the decimal place for some very low incidence chromosome segregation phenotypes (in WT errors are even closer to zero)
OK, I rounded to one decimal place. There were more in the sessions. Most I rounded to no decimal places.
I only kept a decimal place for those which were between 0-1 % and 99-100 %
I only kept a decimal place for those which were between 0-1 % and 99-100 %
Thanks. That makes sense.
I think this is done now.
Errors found in pHAF file from https://github.com/monarch-initiative/monarch-app/issues/647
we should check for only a single ~entrance~ penetrance value per annotation, and no non-ascii characters
high,20 (fixed to 20 (%) 7580 fixed, I used a non ascii dash which got stripped, we will add a check for that) medium,high (fixed to high)