kmayerb / tcrdist3

flexible CDR based distance metrics
MIT License
53 stars 17 forks source link

Poor efficiency on large dataset #89

Open pdevashishmda opened 1 year ago

pdevashishmda commented 1 year ago

I'm working on running the sparse implementation as described here: https://tcrdist3.readthedocs.io/en/latest/sparsity.html?highlight=sparse

My dataset is very large, with more than 200,000 rows, and whether I run the sparse implementation in a Jupyter Notebook or in Docker, it simply will not run. There are no error messages, the analysis just never reaches completion. Do you have any suggestions on how to improve efficiency of TCRdist for large datasets? Or could there be some other problem?