hudsondan / tcr-scapes

MIT License
3 stars 3 forks source link

Empty genes #10

Closed andreas-wilm closed 1 year ago

andreas-wilm commented 1 year ago

Thanks for publishing this much needed benchmark!

I've noticed TCRs with empty chain values slipping through when running the benchmark. When I for example debug python run.py -m hamming -s True -p PAIRED I get multiple entries with missing j.alpha columns:

(Pdb) sum(pd.isna(data2['j.alpha']))
35

These entries could be removed in preprocess(). This might not be a problem given the current list of used programs, but as soon as people plug in methods that do use gene usage, it will be.

andreas-wilm commented 1 year ago

My bad. I realized right after posting that I do not only need -p paired, but also --cs both and the problem disappears