pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
154 stars 23 forks source link

Aligning Chromium GEM-X Single Cell 5' v3 Gene Expression with Feature Barcoding #274

Closed franziskadenk closed 2 weeks ago

franziskadenk commented 2 weeks ago

Dear kb_python Team, As always - loving your aligner! However, I'm trying to run something slightly more unusual this time, and am having trouble getting the parameters right. I'm hoping that you can help me.

I'm trying to align GEM-X Single Cell 5' v3 Gene Expression data (which are also hashed, but that's for another day...). When I tried just the usual 10xv3 flag, I got very low alignment rates. I then read on your forum that this may be due to reads being lost in the reverse read (https://github.com/pachterlab/kallisto/issues/287). I therefore tried the below.

With that I get much better alignment rates (35% vs 19%, and now comparable to Cellranger, which is at 28%; just to note: it's a human sample with not the best quality, so I don't expect the world...).

The issue is: the code gets stuck at the count matrix/bustools stage, and I get kicked out of my instance after 24h...What am I doing wrong?


import os
import subprocess

# Define the paths and parameters
index_path = '/scratch/prj/wcard_advseqanlys/index.idx'
t2g_path = '/scratch/prj/wcard_advseqanlys/t2g.txt'
output_dir = '/scratch/prj/wcard_advseqanlys/aligned_CTL/'
fastq_path_r1 = '/scratch/prj/wcard_advseqanlys/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R1_001.fastq.gz'
fastq_path_r2 = '/scratch/prj/wcard_advseqanlys/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R2_001.fastq.gz'

cpus_per_task = os.getenv("SLURM_CPUS_PER_TASK", "16") 

command = [
    'kb', 'count', '--verbose', '--h5ad',
    '-i', index_path,
    '-g', t2g_path,
    '-x', '0,0,16:0,16,26:0,26,0,1,0,0',
    '-o', output_dir,
    '-t', cpus_per_task,
    '--filter', 'bustools',
    fastq_path_r1,
    fastq_path_r2
]

try:
    subprocess.run(command, check=True)
    print("Command executed successfully.")
except subprocess.CalledProcessError as e:
    print(f"An error occurred: {e}")
Yenaled commented 2 weeks ago

Hi! Thanks for your question!

It’s very hard to diagnose unless you give us the verbose output. Also, what version of kb-python are you using? Finally, did you generate the index and t2g properly?

franziskadenk commented 2 weeks ago

Sorry - my bad! The index and t2g file is fine, I believe, as reads did align with the usual 10xv3 flag, and I anyway used your pre-made ones, as in: kb ref -d human -i index.idx -g t2g.txt -f1 transcriptome.fasta

This is the output when I try and run it with the modified -x flag. It is somewhat unhelpful, since it simply is cancelled due to the 24h time limit on our instance:

[2024-10-27 14:16:48,191] DEBUG [main] Printing verbose output [2024-10-27 14:16:50,420] DEBUG [main] kallisto binary located at /users/k0927902/.local/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto [2024-10-27 14:16:50,420] DEBUG [main] bustools binary located at /users/k0927902/.local/lib/python3.10/site-packages/kb_python/bins/linux/bustools/bustools [2024-10-27 14:16:50,420] DEBUG [main] Creating /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp directory [2024-10-27 14:16:50,423] DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, i='/scratch/prj/wcard_advseqanlys/FDenk/PaulC/index.idx', g='/scratch/prj/wcard_advseqanlys/FDenk/PaulC/t2g.txt', x='0,0,16:0,16,26:0,26,0,1,0,0', o='/scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/', num=False, w=None, r=None, t=16, m='2G', strand=None, inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='standard', em=False, mm=False, tcc=False, filter='bustools', filter_threshold=None, c1=None, c2=None, overwrite=False, dry_run=False, batch_barcodes=False, loom=False, h5ad=True, loom_names='barcode,target_name', sum='none', cellranger=False, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/users/k0927902/.local/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto', bustools='/users/k0927902/.local/lib/python3.10/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=['/scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R1_001.fastq.gz', '/scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R2_001.fastq.gz']) [2024-10-27 14:16:53,790] INFO [count] Using index /scratch/prj/wcard_advseqanlys/FDenk/PaulC/index.idx to generate BUS file to /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/ from [2024-10-27 14:16:53,790] INFO [count] /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R1_001.fastq.gz [2024-10-27 14:16:53,791] INFO [count] /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R2_001.fastq.gz [2024-10-27 14:16:53,791] DEBUG [count] kallisto bus -i /scratch/prj/wcard_advseqanlys/FDenk/PaulC/index.idx -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/ -x 0,0,16:0,16,26:0,26,0,1,0,0 -t 16 /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R1_001.fastq.gz /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R2_001.fastq.gz [2024-10-27 14:16:53,892] DEBUG [count] [2024-10-27 14:16:53,892] DEBUG [count] [bus] Note: Strand option was not specified; setting it to --unstranded for specified technology [2024-10-27 14:16:58,306] DEBUG [count] [index] k-mer length: 31 [2024-10-27 14:17:02,926] DEBUG [count] [index] number of targets: 227,665 [2024-10-27 14:17:02,926] DEBUG [count] [index] number of k-mers: 139,900,295 [2024-10-27 14:17:02,926] DEBUG [count] [index] number of D-list k-mers: 5,477,475 [2024-10-27 14:17:02,926] DEBUG [count] [quant] will process sample 1: /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R1_001.fastq.gz [2024-10-27 14:17:02,926] DEBUG [count] /scratch/prj/wcard_advseqanlys/FDenk/PaulC/X204SC24096176-Z01-F001/01.RawData/CTL_GEX/CTL_GEX-SCI7T048-SCI5T048_22HGM7LT4_S2_L004_R2_001.fastq.gz [2024-10-27 14:17:10,446] DEBUG [count] [quant] finding pseudoalignments for the reads ... [2024-10-27 14:17:15,294] DEBUG [count] [progress] 1M reads processed (17.9% mapped) [2024-10-27 14:17:20,909] DEBUG [count] [progress] 2M reads processed (17.9% mapped) [2024-10-27 14:17:26,019] DEBUG [count] [progress] 3M reads processed (17.9% mapped) [2024-10-27 14:17:30,932] DEBUG [count] [progress] 4M reads processed (18.0% mapped) [2024-10-27 14:17:35,946] DEBUG [count] [progress] 5M reads processed (18.0% mapped) [2024-10-27 14:17:41,861] DEBUG [count] [progress] 6M reads processed (18.0% mapped) [2024-10-27 14:17:46,672] DEBUG [count] [progress] 7M reads processed (18.0% mapped) [2024-10-27 14:17:51,786] DEBUG [count] [progress] 8M reads processed (18.1% mapped) [2024-10-27 14:17:57,397] DEBUG [count] [progress] 9M reads processed (18.1% mapped) [2024-10-27 14:18:02,208] DEBUG [count] [progress] 10M reads processed (18.1% mapped) [2024-10-27 14:18:07,321] DEBUG [count] [progress] 11M reads processed (18.2% mapped) [2024-10-27 14:18:13,034] DEBUG [count] [progress] 12M reads processed (18.2% mapped) [2024-10-27 14:18:17,846] DEBUG [count] [progress] 13M reads processed (18.2% mapped) [2024-10-27 14:18:22,758] DEBUG [count] [progress] 14M reads processed (18.2% mapped) [2024-10-27 14:18:28,472] DEBUG [count] [progress] 15M reads processed (18.2% mapped) [2024-10-27 14:18:33,385] DEBUG [count] [progress] 16M reads processed (18.2% mapped) [2024-10-27 14:18:38,297] DEBUG [count] [progress] 17M reads processed (18.3% mapped) [2024-10-27 14:18:43,809] DEBUG [count] [progress] 18M reads processed (18.3% mapped) [2024-10-27 14:18:48,719] DEBUG [count] [progress] 19M reads processed (18.3% mapped) [2024-10-27 14:18:53,631] DEBUG [count] [progress] 20M reads processed (18.3% mapped) [2024-10-27 14:18:59,344] DEBUG [count] [progress] 21M reads processed (18.3% mapped) [2024-10-27 14:19:04,359] DEBUG [count] [progress] 22M reads processed (18.3% mapped) [2024-10-27 14:19:09,070] DEBUG [count] [progress] 23M reads processed (18.4% mapped) [2024-10-27 14:19:14,686] DEBUG [count] [progress] 24M reads processed (18.4% mapped) [2024-10-27 14:19:19,696] DEBUG [count] [progress] 25M reads processed (18.4% mapped) [2024-10-27 14:19:24,608] DEBUG [count] [progress] 26M reads processed (18.4% mapped) [2024-10-27 14:19:30,229] DEBUG [count] [progress] 27M reads processed (18.4% mapped) [2024-10-27 14:19:35,157] DEBUG [count] [progress] 28M reads processed (18.4% mapped) [2024-10-27 14:19:40,281] DEBUG [count] [progress] 29M reads processed (18.4% mapped) [2024-10-27 14:19:46,007] DEBUG [count] [progress] 30M reads processed (18.4% mapped) [2024-10-27 14:19:51,030] DEBUG [count] [progress] 31M reads processed (18.4% mapped) [2024-10-27 14:19:56,145] DEBUG [count] [progress] 32M reads processed (18.4% mapped) [2024-10-27 14:20:01,557] DEBUG [count] [progress] 33M reads processed (18.4% mapped) [2024-10-27 14:20:06,569] DEBUG [count] [progress] 34M reads processed (18.4% mapped) [2024-10-27 14:20:11,483] DEBUG [count] [progress] 35M reads processed (18.4% mapped) [2024-10-27 14:20:17,295] DEBUG [count] [progress] 36M reads processed (18.3% mapped) [2024-10-27 14:20:22,309] DEBUG [count] [progress] 38M reads processed (18.3% mapped) [2024-10-27 14:20:27,119] DEBUG [count] [progress] 39M reads processed (18.3% mapped) [2024-10-27 14:20:32,931] DEBUG [count] [progress] 40M reads processed (18.3% mapped) [2024-10-27 14:20:37,745] DEBUG [count] [progress] 41M reads processed (18.3% mapped) [2024-10-27 14:20:42,657] DEBUG [count] [progress] 42M reads processed (18.3% mapped) [2024-10-27 14:20:48,370] DEBUG [count] [progress] 43M reads processed (18.3% mapped) [2024-10-27 14:20:53,181] DEBUG [count] [progress] 44M reads processed (18.4% mapped) [2024-10-27 14:20:58,094] DEBUG [count] [progress] 45M reads processed (18.4% mapped) [2024-10-27 14:21:03,906] DEBUG [count] [progress] 46M reads processed (18.4% mapped) [2024-10-27 14:21:08,618] DEBUG [count] [progress] 47M reads processed (18.4% mapped) [2024-10-27 14:21:13,632] DEBUG [count] [progress] 48M reads processed (18.4% mapped) [2024-10-27 14:21:19,244] DEBUG [count] [progress] 49M reads processed (18.4% mapped) [2024-10-27 14:21:23,955] DEBUG [count] [progress] 50M reads processed (18.4% mapped) [2024-10-27 14:21:28,968] DEBUG [count] [progress] 51M reads processed (18.4% mapped) [2024-10-27 14:21:34,781] DEBUG [count] [progress] 52M reads processed (18.4% mapped) [2024-10-27 14:21:39,592] DEBUG [count] [progress] 53M reads processed (18.4% mapped) [2024-10-27 14:21:44,404] DEBUG [count] [progress] 54M reads processed (18.4% mapped) [2024-10-27 14:21:50,219] DEBUG [count] [progress] 55M reads processed (18.4% mapped) [2024-10-27 14:21:55,231] DEBUG [count] [progress] 56M reads processed (18.4% mapped) [2024-10-27 14:22:00,242] DEBUG [count] [progress] 57M reads processed (18.4% mapped) [2024-10-27 14:22:06,055] DEBUG [count] [progress] 58M reads processed (18.4% mapped) [2024-10-27 14:22:10,967] DEBUG [count] [progress] 59M reads processed (18.4% mapped) [2024-10-27 14:22:15,783] DEBUG [count] [progress] 60M reads processed (18.4% mapped) [2024-10-27 14:22:21,697] DEBUG [count] [progress] 61M reads processed (18.4% mapped) [2024-10-27 14:22:26,710] DEBUG [count] [progress] 62M reads processed (18.4% mapped) [2024-10-27 14:22:31,521] DEBUG [count] [progress] 63M reads processed (18.4% mapped) [2024-10-27 14:22:36,932] DEBUG [count] [progress] 64M reads processed (18.4% mapped) [2024-10-27 14:22:42,145] DEBUG [count] [progress] 65M reads processed (18.4% mapped) [2024-10-27 14:22:46,955] DEBUG [count] [progress] 66M reads processed (18.4% mapped) [2024-10-27 14:22:52,772] DEBUG [count] [progress] 67M reads processed (18.4% mapped) [2024-10-27 14:22:57,782] DEBUG [count] [progress] 68M reads processed (18.4% mapped) [2024-10-27 14:23:02,494] DEBUG [count] [progress] 69M reads processed (18.4% mapped) [2024-10-27 14:23:07,506] DEBUG [count] [progress] 70M reads processed (18.4% mapped) [2024-10-27 14:23:13,118] DEBUG [count] [progress] 71M reads processed (18.4% mapped) [2024-10-27 14:23:17,930] DEBUG [count] [progress] 72M reads processed (18.4% mapped) [2024-10-27 14:23:23,546] DEBUG [count] [progress] 73M reads processed (18.4% mapped) [2024-10-27 14:23:28,458] DEBUG [count] [progress] 75M reads processed (18.4% mapped) [2024-10-27 14:23:33,266] DEBUG [count] [progress] 76M reads processed (18.4% mapped) [2024-10-27 14:23:38,380] DEBUG [count] [progress] 77M reads processed (18.4% mapped) [2024-10-27 14:23:44,093] DEBUG [count] [progress] 78M reads processed (18.4% mapped) [2024-10-27 14:23:48,907] DEBUG [count] [progress] 79M reads processed (18.4% mapped) [2024-10-27 14:23:54,117] DEBUG [count] [progress] 80M reads processed (18.4% mapped) [2024-10-27 14:23:59,432] DEBUG [count] [progress] 81M reads processed (18.4% mapped) [2024-10-27 14:24:04,442] DEBUG [count] [progress] 82M reads processed (18.4% mapped) [2024-10-27 14:24:09,558] DEBUG [count] [progress] 83M reads processed (18.4% mapped) [2024-10-27 14:24:15,371] DEBUG [count] [progress] 84M reads processed (18.4% mapped) [2024-10-27 14:24:20,484] DEBUG [count] [progress] 85M reads processed (18.4% mapped) [2024-10-27 14:24:25,296] DEBUG [count] [progress] 86M reads processed (18.4% mapped) [2024-10-27 14:24:31,109] DEBUG [count] [progress] 87M reads processed (18.4% mapped) [2024-10-27 14:24:36,123] DEBUG [count] [progress] 88M reads processed (18.4% mapped) [2024-10-27 14:24:40,935] DEBUG [count] [progress] 89M reads processed (18.4% mapped) [2024-10-27 14:24:46,446] DEBUG [count] [progress] 90M reads processed (18.4% mapped) [2024-10-27 14:24:51,559] DEBUG [count] [progress] 91M reads processed (18.4% mapped) [2024-10-27 14:24:56,472] DEBUG [count] [progress] 92M reads processed (18.4% mapped) [2024-10-27 14:25:02,185] DEBUG [count] [progress] 93M reads processed (18.4% mapped) [2024-10-27 14:25:07,196] DEBUG [count] [progress] 94M reads processed (18.4% mapped) [2024-10-27 14:25:12,008] DEBUG [count] [progress] 95M reads processed (18.4% mapped) [2024-10-27 14:25:17,319] DEBUG [count] [progress] 96M reads processed (18.4% mapped) [2024-10-27 14:25:22,632] DEBUG [count] [progress] 97M reads processed (18.4% mapped) [2024-10-27 14:25:27,443] DEBUG [count] [progress] 98M reads processed (18.4% mapped) [2024-10-27 14:25:32,959] DEBUG [count] [progress] 99M reads processed (18.4% mapped) [2024-10-27 14:25:37,971] DEBUG [count] [progress] 100M reads processed (18.4% mapped) [2024-10-27 14:25:42,780] DEBUG [count] [progress] 101M reads processed (18.4% mapped) [2024-10-27 14:25:48,313] DEBUG [count] [progress] 102M reads processed (18.4% mapped) [2024-10-27 14:25:53,527] DEBUG [count] [progress] 103M reads processed (18.4% mapped) [2024-10-27 14:25:58,438] DEBUG [count] [progress] 104M reads processed (18.4% mapped) [2024-10-27 14:26:03,650] DEBUG [count] [progress] 105M reads processed (18.4% mapped) [2024-10-27 14:26:08,964] DEBUG [count] [progress] 106M reads processed (18.4% mapped) [2024-10-27 14:26:13,675] DEBUG [count] [progress] 107M reads processed (18.4% mapped) [2024-10-27 14:26:19,088] DEBUG [count] [progress] 108M reads processed (18.4% mapped) [2024-10-27 14:26:24,500] DEBUG [count] [progress] 109M reads processed (18.4% mapped) [2024-10-27 14:26:29,311] DEBUG [count] [progress] 110M reads processed (18.5% mapped) [2024-10-27 14:26:34,624] DEBUG [count] [progress] 112M reads processed (18.5% mapped) [2024-10-27 14:26:39,135] DEBUG [count] [progress] 113M reads processed (18.5% mapped) [2024-10-27 14:26:42,744] DEBUG [count] [progress] 114M reads processed (18.5% mapped) [2024-10-27 14:26:46,652] DEBUG [count] [progress] 115M reads processed (18.5% mapped) [2024-10-27 14:26:50,462] DEBUG [count] [progress] 116M reads processed (18.5% mapped) [2024-10-27 14:26:54,170] DEBUG [count] [progress] 117M reads processed (18.5% mapped) [2024-10-27 14:26:58,180] DEBUG [count] [progress] 118M reads processed (18.5% mapped) [2024-10-27 14:27:01,789] DEBUG [count] [progress] 119M reads processed (18.6% mapped) [2024-10-27 14:27:05,801] DEBUG [count] [progress] 120M reads processed (18.6% mapped) [2024-10-27 14:27:09,410] DEBUG [count] [progress] 121M reads processed (18.6% mapped) [2024-10-27 14:27:13,318] DEBUG [count] [progress] 122M reads processed (18.6% mapped) [2024-10-27 14:27:17,128] DEBUG [count] [progress] 123M reads processed (18.6% mapped) [2024-10-27 14:27:20,737] DEBUG [count] [progress] 124M reads processed (18.7% mapped) [2024-10-27 14:27:24,646] DEBUG [count] [progress] 125M reads processed (18.7% mapped) [2024-10-27 14:27:28,254] DEBUG [count] [progress] 126M reads processed (18.7% mapped) [2024-10-27 14:27:32,164] DEBUG [count] [progress] 127M reads processed (18.7% mapped) [2024-10-27 14:27:35,673] DEBUG [count] [progress] 128M reads processed (18.7% mapped) [2024-10-27 14:27:39,683] DEBUG [count] [progress] 129M reads processed (18.8% mapped) [2024-10-27 14:27:43,390] DEBUG [count] [progress] 130M reads processed (18.8% mapped) [2024-10-27 14:27:47,107] DEBUG [count] [progress] 131M reads processed (18.8% mapped) [2024-10-27 14:27:51,116] DEBUG [count] [progress] 132M reads processed (18.8% mapped) [2024-10-27 14:27:54,523] DEBUG [count] [progress] 133M reads processed (18.8% mapped) [2024-10-27 14:27:58,533] DEBUG [count] [progress] 134M reads processed (18.9% mapped) [2024-10-27 14:28:02,044] DEBUG [count] [progress] 135M reads processed (18.9% mapped) [2024-10-27 14:28:05,954] DEBUG [count] [progress] 136M reads processed (18.9% mapped) [2024-10-27 14:28:09,762] DEBUG [count] [progress] 137M reads processed (18.9% mapped) [2024-10-27 14:28:13,471] DEBUG [count] [progress] 138M reads processed (19.0% mapped) [2024-10-27 14:28:17,581] DEBUG [count] [progress] 139M reads processed (19.0% mapped) [2024-10-27 14:28:21,493] DEBUG [count] [progress] 140M reads processed (19.0% mapped) [2024-10-27 14:28:25,700] DEBUG [count] [progress] 141M reads processed (19.0% mapped) [2024-10-27 14:28:29,911] DEBUG [count] [progress] 142M reads processed (19.0% mapped) [2024-10-27 14:28:33,921] DEBUG [count] [progress] 143M reads processed (19.0% mapped) [2024-10-27 14:28:38,032] DEBUG [count] [progress] 144M reads processed (19.1% mapped) [2024-10-27 14:28:42,141] DEBUG [count] [progress] 145M reads processed (19.1% mapped) [2024-10-27 14:28:46,253] DEBUG [count] [progress] 146M reads processed (19.1% mapped) [2024-10-27 14:28:50,260] DEBUG [count] [progress] 147M reads processed (19.1% mapped) [2024-10-27 14:28:54,473] DEBUG [count] [progress] 148M reads processed (19.1% mapped) [2024-10-27 14:28:58,680] DEBUG [count] [progress] 149M reads processed (19.1% mapped) [2024-10-27 14:29:02,492] DEBUG [count] [progress] 150M reads processed (19.1% mapped) [2024-10-27 14:29:06,701] DEBUG [count] [progress] 151M reads processed (19.1% mapped) [2024-10-27 14:29:10,810] DEBUG [count] [progress] 152M reads processed (19.2% mapped) [2024-10-27 14:29:14,819] DEBUG [count] [progress] 153M reads processed (19.2% mapped) [2024-10-27 14:29:18,929] DEBUG [count] [progress] 154M reads processed (19.2% mapped) [2024-10-27 14:29:22,940] DEBUG [count] [progress] 155M reads processed (19.2% mapped) [2024-10-27 14:29:27,050] DEBUG [count] [progress] 156M reads processed (19.2% mapped) [2024-10-27 14:29:30,958] DEBUG [count] [progress] 157M reads processed (19.2% mapped) [2024-10-27 14:29:35,069] DEBUG [count] [progress] 158M reads processed (19.2% mapped) [2024-10-27 14:29:39,078] DEBUG [count] [progress] 159M reads processed (19.3% mapped) [2024-10-27 14:29:43,090] DEBUG [count] [progress] 160M reads processed (19.3% mapped) [2024-10-27 14:29:47,298] DEBUG [count] [progress] 161M reads processed (19.3% mapped) [2024-10-27 14:29:51,309] DEBUG [count] [progress] 162M reads processed (19.3% mapped) [2024-10-27 14:29:55,618] DEBUG [count] [progress] 163M reads processed (19.3% mapped) [2024-10-27 14:29:59,828] DEBUG [count] [progress] 164M reads processed (19.3% mapped) [2024-10-27 14:30:04,037] DEBUG [count] [progress] 165M reads processed (19.3% mapped) [2024-10-27 14:30:08,047] DEBUG [count] [progress] 166M reads processed (19.3% mapped) [2024-10-27 14:30:12,159] DEBUG [count] [progress] 167M reads processed (19.3% mapped) [2024-10-27 14:30:16,270] DEBUG [count] [progress] 169M reads processed (19.3% mapped) [2024-10-27 14:30:20,381] DEBUG [count] [progress] 170M reads processed (19.3% mapped) [2024-10-27 14:30:24,590] DEBUG [count] [progress] 171M reads processed (19.3% mapped) [2024-10-27 14:30:28,500] DEBUG [count] [progress] 172M reads processed (19.3% mapped) [2024-10-27 14:30:32,711] DEBUG [count] [progress] 173M reads processed (19.4% mapped) [2024-10-27 14:30:36,620] DEBUG [count] [progress] 174M reads processed (19.4% mapped) [2024-10-27 14:30:40,729] DEBUG [count] [progress] 175M reads processed (19.4% mapped) [2024-10-27 14:30:44,839] DEBUG [count] [progress] 176M reads processed (19.4% mapped) [2024-10-27 14:30:48,850] DEBUG [count] [progress] 177M reads processed (19.4% mapped) [2024-10-27 14:30:52,957] DEBUG [count] [progress] 178M reads processed (19.4% mapped) [2024-10-27 14:30:56,869] DEBUG [count] [progress] 179M reads processed (19.4% mapped) [2024-10-27 14:31:01,080] DEBUG [count] [progress] 180M reads processed (19.4% mapped) [2024-10-27 14:31:05,087] DEBUG [count] [progress] 181M reads processed (19.4% mapped) [2024-10-27 14:31:09,197] DEBUG [count] [progress] 182M reads processed (19.4% mapped) [2024-10-27 14:31:13,106] DEBUG [count] [progress] 183M reads processed (19.5% mapped) [2024-10-27 14:31:17,116] DEBUG [count] [progress] 184M reads processed (19.5% mapped) [2024-10-27 14:31:21,224] DEBUG [count] [progress] 185M reads processed (19.5% mapped) [2024-10-27 14:31:25,134] DEBUG [count] [progress] 186M reads processed (19.5% mapped) [2024-10-27 14:31:29,345] DEBUG [count] [progress] 187M reads processed (19.5% mapped) [2024-10-27 14:31:33,255] DEBUG [count] [progress] 188M reads processed (19.5% mapped) [2024-10-27 14:31:37,867] DEBUG [count] [progress] 189M reads processed (19.5% mapped) [2024-10-27 14:31:41,977] DEBUG [count] [progress] 190M reads processed (19.5% mapped) [2024-10-27 14:31:46,385] DEBUG [count] [progress] 191M reads processed (19.5% mapped) [2024-10-27 14:31:50,297] DEBUG [count] [progress] 192M reads processed (19.5% mapped) [2024-10-27 14:31:54,606] DEBUG [count] [progress] 193M reads processed (19.5% mapped) [2024-10-27 14:31:58,817] DEBUG [count] [progress] 194M reads processed (19.5% mapped) [2024-10-27 14:32:02,123] DEBUG [count] [progress] 195M reads processed (19.5% mapped) done [2024-10-27 14:32:02,123] DEBUG [count] [quant] processed 196,154,452 reads, 38,312,594 reads pseudoaligned [2024-10-27 14:32:02,725] DEBUG [count] [2024-10-27 14:32:04,528] INFO [count] Sorting BUS file /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.bus to /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:04,528] DEBUG [count] bustools sort -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus -T /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp -t 16 -m 2G /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.bus [2024-10-27 14:32:06,635] DEBUG [count] partition time: 1.03233s [2024-10-27 14:32:07,236] DEBUG [count] all fits in buffer [2024-10-27 14:32:08,939] DEBUG [count] Read in 38312594 BUS records [2024-10-27 14:32:08,939] DEBUG [count] reading time 0.191277s [2024-10-27 14:32:08,939] DEBUG [count] sorting time 6.76739s [2024-10-27 14:32:08,939] DEBUG [count] writing time 0.590374s [2024-10-27 14:32:08,939] INFO [count] On-list not provided [2024-10-27 14:32:08,939] INFO [count] Generating on-list /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/whitelist.txt from BUS file /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:08,940] DEBUG [count] bustools allowlist -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/whitelist.txt /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:10,142] DEBUG [count] Read in 20865828 BUS records, wrote 2300 barcodes to on-list with threshold 1958 [2024-10-27 14:32:10,142] INFO [count] Inspecting BUS file /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:10,142] DEBUG [count] bustools inspect -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/inspect.json -w /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/whitelist.txt /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:12,848] INFO [count] Correcting BUS records in /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus to /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.c.bus with on-list /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/whitelist.txt [2024-10-27 14:32:12,848] DEBUG [count] bustools correct -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.c.bus -w /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/whitelist.txt /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.bus [2024-10-27 14:32:12,949] DEBUG [count] Found 2300 barcodes in the on-list [2024-10-27 14:32:15,353] DEBUG [count] Processed 20865828 BUS records [2024-10-27 14:32:15,353] DEBUG [count] In on-list = 13170766 [2024-10-27 14:32:15,353] DEBUG [count] Corrected = 781891 [2024-10-27 14:32:15,353] DEBUG [count] Uncorrected = 6913171 [2024-10-27 14:32:15,353] INFO [count] Sorting BUS file /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.c.bus to /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.unfiltered.bus [2024-10-27 14:32:15,353] DEBUG [count] bustools sort -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.unfiltered.bus -T /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp -t 16 -m 2G /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/tmp/output.s.c.bus [2024-10-27 14:32:16,458] DEBUG [count] partition time: 0.157071s [2024-10-27 14:32:16,659] DEBUG [count] all fits in buffer [2024-10-27 14:32:18,161] DEBUG [count] Read in 13952657 BUS records [2024-10-27 14:32:18,161] DEBUG [count] reading time 0.078844s [2024-10-27 14:32:18,161] DEBUG [count] sorting time 1.78791s [2024-10-27 14:32:18,161] DEBUG [count] writing time 0.312446s [2024-10-27 14:32:18,162] INFO [count] Generating count matrix /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/counts_unfiltered/cells_x_genes from BUS file /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.unfiltered.bus [2024-10-27 14:32:18,163] DEBUG [count] bustools count -o /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/counts_unfiltered/cells_x_genes -g /scratch/prj/wcard_advseqanlys/FDenk/PaulC/t2g.txt -e /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/matrix.ec -t /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/transcripts.txt --genecounts --umi-gene /scratch/prj/wcard_advseqanlys/FDenk/PaulC/aligned_CTL/output.unfiltered.bus slurmstepd: error: JOB 21810141 ON erc-hpc-comp008 CANCELLED AT 2024-10-28T14:16:51 DUE TO TIME LIMIT

Yenaled commented 2 weeks ago

Hmm, honestly, everything seems correct so I'm not sure what's going on...

Maybe try upgrading to kb-python 0.29.1 and rerunning? I added some extra checks in that version to make sure all files are valid.

Otherwise, if you email me the output.unfiltered.bus, transcripts.txt, and matrix.ec files in your output directory, I can try to figure out what's wrong.

franziskadenk commented 2 weeks ago

Thanks so much! I've just sent you the files. I really appreciate you taking another look.

Yenaled commented 2 weeks ago

Issue has been resolved :) The t2g.txt file had accidentally been overwritten so needed to be redownloaded.