phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

Update fixup_gene_name() to select the lowest expect allele, rather than just *01 when no allele provided. #65

Closed bbimber closed 6 months ago

bbimber commented 6 months ago

@sschattgen: this is a suggestion related to our email in December. This probably would help the situation with Rhesus macaques, where some segments do not have an 01 allele, but they do have 02.

phbradley commented 6 months ago

Hi there-- thanks for this new code! Quick questions

bbimber commented 6 months ago

@phbradley : thanks for the reply. to those questions:

1) You probably have a point on natsort. Python isnt a language I know that well, and I mistakenly thought python could detect imports. I added natsort to setup.py.

2) Yes, I bet in virtually all cases sorted() would work instead of natsort. I chose natsorted() because if a given gene has an 10 allele, then this would be chosen preferentially over a 02 allele. Granted, this is unlikely, but it seemed better than put a known issue into the code.

phbradley commented 6 months ago

OK, thanks for the reply. My preference would be to use the built-in function sorted, to reduce dependencies and prevent breakage with existing python environments. It seems to give a reasonable result in this case:

In [9]: sorted(['TRBV19*01','TRBV19*02','TRBV19*10'])
Out[9]: ['TRBV19*01', 'TRBV19*02', 'TRBV19*10']

Does that sound alright?

bbimber commented 6 months ago

Interesting, I did not expect python to behave that way. I agree sorted() makes sense over natsort(). I just made those changes

phbradley commented 6 months ago

Great, thank you!