I've noticed TCRs with empty chain values slipping through when running the benchmark. When I for example debug python run.py -m hamming -s True -p PAIRED I get multiple entries with missing j.alpha columns:
(Pdb) sum(pd.isna(data2['j.alpha']))
35
These entries could be removed in preprocess(). This might not be a problem given the current list of used programs, but as soon as people plug in methods that do use gene usage, it will be.
Thanks for publishing this much needed benchmark!
I've noticed TCRs with empty chain values slipping through when running the benchmark. When I for example debug
python run.py -m hamming -s True -p PAIRED
I get multiple entries with missing j.alpha columns:These entries could be removed in
preprocess()
. This might not be a problem given the current list of used programs, but as soon as people plug in methods that do use gene usage, it will be.