veltenlab / CloneTracer

This repository contains scripts to identify healthy and malignant cells from scRNAseq with CloneTracer and process data from Optimized 10x libraries
MIT License
24 stars 1 forks source link

runtime #3

Closed rtyags closed 1 year ago

rtyags commented 1 year ago

Hi, I am trying to find information about how long I should expect clonetracer to run for my sample. Currently, even a toy sample is taking very long to run, and I have no way to estimate how long would be a reasonable time to wait before I stop it and look for a potential mistake.

sergibeneyto commented 1 year ago

Hi,

thank you for your comment. The runtime depends mostly on the number of mutations you selected in the input. Unfortunately CloneTracer does not scale very well with many mutated sites yet. The reason for it is that the number of possible clonal hierarchies increases exponentially with the number of mutated sites.

The maximum number of mutations I have tried is 10-12 which took approximately 1 day to run. Typically 3-5 mutations should run within an hour. These are runtimes computed using a GPU. Of course the total time may vary depending on the uncertainty of the clonal hierarchy inference. Well-covered mutations will lead to faster runtimes compared to samples with many poorly covered mutations.

If you have many mutations, you could discard mutated sites which are covered in a small fraction of cells (for example <10-15 % of cells), as these sites do not provide much information for the clonal hierarchy inference and add a lot of uncertainty. However, we often experienced that poorly covered SNVs were known driver mutations and therefore it was necessary to keep them to help interpreting the identities of the clones.

If you want to check if the input file is correct you could run the initial steps of the model interactively (e.g. in a jupyter notebook). There are 2 notebook examples in our repository.

We are currently thinking about ways of making the CloneTracer heuristic search of trees more efficient, but we have not implement it yet.