nimwegenLab / cellstates

Finding gene expression states in scRNA-seq data
MIT License
46 stars 4 forks source link

Runtime expectations #14

Open nick-res opened 6 months ago

nick-res commented 6 months ago

The README acknowledges that the script can take a long time to run - does the estimated runtime provided by the script take into account multithreading? For example, I have a conversative estimate runtime of 226 days on a dataframe with 27,000 cells and 60,000 genes when running the script utilizing 46 threads.

2024-03-06 18:17:36,511 - INFO:predicted runtime (conservative estimate): 226 days, 1 hours, 37 minutes

ismara-unibas commented 5 months ago

I have checked it with dataset of the same size (randomly generated) and I have got 720 days for single threaded mode and 30 days with 46 threads. So cellstates do take in account number of threads. Make sure that you use -t option to specify number of threads when you run the script.

nick-res commented 5 months ago

Thanks for checking. I ended up running a dataset of ~14,000 cells with 46 threads and the predicted runtime is 51 days with the -t 46 flag applied

python ./scripts/run_cellstates.py input-data/sample1-qc.csv --save-intermediates --outdir ./sfm-13-1-results -t 46
2024-03-06 22:54:07,771 - INFO:loading input-data/sample1-qc.csv

/home/ec2-user/environment/nimwegen-cell-states/./scripts/run_cellstates.py:73: FutureWarning: The 'delim_whitespace' keyword in pd.read_csv is deprecated and will be removed in a future version. Use ``sep='\s+'`` instead
  df = pd.read_csv(datafile, delim_whitespace=True, header=0, index_col=0)

2024-03-06 23:02:36,046 - INFO:predicted runtime (conservative estimate): 51 days, 16 hours, 12 minutes

is downsampling recommended to obtain runtimes within a shorter timeframe or are there other parameters i should look to change?

Biomiha commented 6 hours ago

As an FYI, I've found that above a certain number (much less than the number of threads available) the predicted runtime starts to increase again. Have you tried fewer threads?