pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

Run Time #206

Closed BenjaminDEMAILLE closed 1 year ago

BenjaminDEMAILLE commented 1 year ago

Hi ! How long can I expect for one sample ? I run on MacBook Pro M1 Max I have 20k cells for lamanno workflow

Last login: Wed Jul  5 18:56:19 on ttys000
benjamin@macbook-pro-de-benjamin ~ % python3 /Users/benjamin/Dropbox\ \(U932\)/Salmon\ team/NGS/Lung/KPEpAndStr/scripts/Velo_KPEAndStr.py
/Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/
[2023-07-05 20:36:36,154]   DEBUG [main] Printing verbose output
[2023-07-05 20:36:38,380]   DEBUG [main] kallisto binary located at /opt/homebrew/lib/python3.11/site-packages/kb_python/bins/darwin/kallisto/kallisto
[2023-07-05 20:36:38,381]   DEBUG [main] bustools binary located at /opt/homebrew/lib/python3.11/site-packages/kb_python/bins/darwin/bustools/bustools
[2023-07-05 20:36:38,382]   DEBUG [main] Creating `/Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/Velocity_data/KP_ES_W2_S2/tmp` directory
[2023-07-05 20:36:38,383]   DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, i='/Users/benjamin/index/transcriptome.idx', g='/Users/benjamin/index/t2g.txt', x='10xv3', o='/Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/Velocity_data/KP_ES_W2_S2', w=None, t=10, m='32G', strand=None, workflow='lamanno', em=False, umi_gene=False, mm=False, tcc=False, filter='bustools', filter_threshold=None, c1='/Users/benjamin/index/spliced_t2c.txt', c2='/Users/benjamin/index/unspliced_t2c.txt', overwrite=True, dry_run=False, loom=False, h5ad=False, cellranger=False, gene_names=False, report=False, no_inspect=False, kallisto='/opt/homebrew/lib/python3.11/site-packages/kb_python/bins/darwin/kallisto/kallisto', bustools='/opt/homebrew/lib/python3.11/site-packages/kb_python/bins/darwin/bustools/bustools', no_validate=False, parity=None, fragment_l=None, fragment_s=None, fastqs=['/Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R1_001.fastq.gz', '/Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R2_001.fastq.gz'])
[2023-07-05 20:36:40,918]    INFO [count_lamanno] Using index /Users/benjamin/index/transcriptome.idx to generate BUS file to /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/Velocity_data/KP_ES_W2_S2 from
[2023-07-05 20:36:40,918]    INFO [count_lamanno]         /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R1_001.fastq.gz
[2023-07-05 20:36:40,918]    INFO [count_lamanno]         /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R2_001.fastq.gz
[2023-07-05 20:36:40,918]   DEBUG [count_lamanno] kallisto bus -i /Users/benjamin/index/transcriptome.idx -o /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/Velocity_data/KP_ES_W2_S2 -x 10xv3 -t 10 /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R1_001.fastq.gz /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R2_001.fastq.gz
[2023-07-05 20:36:41,029]   DEBUG [count_lamanno] 
[2023-07-05 20:36:41,029]   DEBUG [count_lamanno] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2023-07-05 20:36:41,029]   DEBUG [count_lamanno] [index] k-mer length: 31
[2023-07-05 20:36:41,029]   DEBUG [count_lamanno] [index] number of targets: 790,418
[2023-07-05 20:36:41,029]   DEBUG [count_lamanno] [index] number of k-mers: 1,112,133,117
[2023-07-05 20:38:46,305]   DEBUG [count_lamanno] [index] number of equivalence classes: 5,486,678
[2023-07-05 20:40:17,929]   DEBUG [count_lamanno] [quant] will process sample 1: /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R1_001.fastq.gz
[2023-07-05 20:40:17,929]   DEBUG [count_lamanno] /Users/benjamin/Dropbox (U932)/Salmon team/Data_storage/Seq_data/scRNASeq/KPEpAndStr/fastqs/KP_ES_W2_S2/KP_ES_W2_S2_S2_L001_R2_001.fastq.gz

It's quite long, no ?

Yenaled commented 1 year ago

It will take much shorter on the newest version of the kallisto binary (v0.50.0).

BenjaminDEMAILLE commented 1 year ago

It will take much shorter on the newest version of the kallisto binary (v0.50.0).

I ran 'kb compile all' and it's worked !!!

just by curiosity : what's the difference between 0.48.0 and 0.50.0 because my index goes from 30Go to 6Go with the same files and command

Yenaled commented 12 months ago

The entire engine was rewritten in 0.50.0 -- a huge amount of changes under the hood; so you'll likely observe major changes with performance, with identical (or near-identical) output.