ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
307 stars 54 forks source link

Slow running combineBCR() #104

Closed eudoraleer closed 3 years ago

eudoraleer commented 3 years ago

Hi,

Running function combineBCR() is very slow for me, comparing to relatively faster computation for combineTCR(), using the same number of samples and similar data size. I wonder what could be the reason?

Thank you.

ncborcherding commented 3 years ago

Hey Lu P, This is really due to somatic hypermutation and clonal calling:

Unlike combineTCR(), combineBCR produces a column CTstrict of an index of nucleotide sequence and the corresponding v-gene. This index automatically calculates the Levenshtein distance between sequences of the same length and will index sequences with <= 0.15 normalized Levenshtein distance with the same ID for sequences with < 15 nucleotide difference in length.

So it takes some additional time to run the edit distances.

Thanks, Nick

eudoraleer commented 3 years ago

Hey Lu P, This is really due to somatic hypermutation and clonal calling:

Unlike combineTCR(), combineBCR produces a column CTstrict of an index of nucleotide sequence and the corresponding v-gene. This index automatically calculates the Levenshtein distance between sequences of the same length and will index sequences with <= 0.15 normalized Levenshtein distance with the same ID for sequences with < 15 nucleotide difference in length.

So it takes some additional time to run the edit distances.

Thanks, Nick

Dear Nick,

Thank you for your quick reply! Looking forward to faster processing of this function in future versions :) Also, a great improvement in the current version comparing to the version early this year;)

Best, Lu

ncborcherding commented 3 years ago

Hey Lu,

The newest dev version of scRepertoire v1.3.4 should have a substantially sped up version of combineBCR() (and combineTCR(filtermulti = TRUE)), both the filtering and clustering internal functions have been modified for efficiency. If you get a chance, let me know what you think.

Nick

eudoraleer commented 3 years ago

Hey Lu,

The newest dev version of scRepertoire v1.3.4 should have a substantially sped up version of combineBCR() (and combineTCR(filtermulti = TRUE)), both the filtering and clustering internal functions have been modified for efficiency. If you get a chance, let me know what you think.

Nick

Dear Nick,

Thank you for the fast update! I will test it out using my data next week and I will let you know then how it goes:)

Best Regards, Lu