pirl-unc / mhcgnomes

Parsing MHC nomenclature in the wild
Apache License 2.0
16 stars 3 forks source link

Caching bug #18

Open ghost opened 1 year ago

ghost commented 1 year ago

Hi, thanks for sharing this repo.

I have encountered a bug where I get different results depending on the order in which I parse two names. Specifically, I would like to parse:

DPB10401 -> HLA-DPB1*04:01 DPB110401 -> HLA-DPB1*104:01

but the second name that I parse always parses incorrectly.

Here is the first ordering:

>>> import mhcgnomes
>>> mhcgnomes.parse("DPB10401").to_string()  # Correct
'HLA-DPB1*04:01'

>>> mhcgnomes.parse("DPB110401").to_string()  # Incorrect
'HLA-DPB1*11:04:01'

Now I restart the python interpreter (this is important for clearing the cache) and run the second ordering:

>>> import mhcgnomes
>>> mhcgnomes.parse("DPB110401").to_string()  # Correct
'HLA-DPB1*104:01'

>>> mhcgnomes.parse("DPB10401").to_string() # Incorrect
'HLA-DPB1*104:01'

I am using mhcgnomes 1.8.4 and Python 3.9.16.

Any help would be appreciated!

fkgruber commented 6 months ago

this is still happening in 1.8.6 with python 3.10.14

fkgruber commented 6 months ago

also 1.9

theomeb commented 1 month ago

It can be avoided by using the uncached Parser, cf. mhcgnomes.Parser().parse("DPB10401").to_string() but indeed quite nasty bug 😅