pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
147 stars 23 forks source link

Why are UMI counts in adata.h5ad and cells_x_genes.total.mtx different? #265

Closed NaotoKubota closed 2 months ago

NaotoKubota commented 2 months ago

Describe the issue Thank you for creating such a wonderful tool! I would like to ask you about the difference between adata.h5ad and cells_x_genes.total.mtx. I ran kb count and used an output file adata.h5ad for downstream processing by scanpy, but the UMI count stored in the h5ad file seems to be much smaller than I got from another tool, STARsolo. Next I used cells_x_genes.total.mtx which shows a very similar UMI count with STARsolo. I think it makes sense to use cells_x_genes.total.mtx instead of adata.h5ad, but I still don't understand what makes the difference between these files. Is this a bug?

What is the exact command that was run? I ran kb in Snakemake workflow.

rule kb_count:
    wildcard_constraints:
        sample = "|".join([re.escape(str(x)) for x in samples])
    container:
        "docker://quay.io/biocontainers/kb-python:0.28.2--pyhdfd78af_2"
    input:
        kb_index = "kb_index/index.idx",
        kb_t2g = "kb_index/t2g.txt",
        kb_cdna_fa = "kb_index/cdna.fa",
        kb_intron_fa = "kb_index/intron.fa",
        kb_cdna_t2c = "kb_index/cdna_t2c.txt",
        kb_intron_t2c = "kb_index/intron_t2c.txt"
    output:
        adata_unfiltered_h5ad = "kb/{sample}/counts_unfiltered/adata.h5ad",
        adata_filtered_h5ad = "kb/{sample}/counts_filtered/adata.h5ad",
        inspect = "kb/{sample}/inspect.json"
    params:
        R1_R2 = lambda wildcards: samples_dict[wildcards.sample],
        kb_output = "kb/{sample}"
    threads:
        workflow.cores / 4
    benchmark:
        "benchmark/kb_count_{sample}.txt"
    log:
        "log/kb_count_{sample}.log"
    shell:
        """
        kb count \
        -i {input.kb_index} \
        -g {input.kb_t2g} \
        -c1 {input.kb_cdna_t2c} \
        -c2 {input.kb_intron_t2c} \
        -x {config[technology]} \
        -o {params.kb_output} \
        -t {threads} \
        --workflow nac \
        --h5ad \
        --gene-names \
        --sum total \
        --filter bustools \
        --overwrite \
        --verbose \
        {params.R1_R2} >& {log}

Command output (with --verbose flag)

[2024-08-09 19:37:43,962]   DEBUG [main] Printing verbose output
[2024-08-09 19:37:46,179]   DEBUG [main] kallisto binary located at /usr/local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2024-08-09 19:37:46,179]   DEBUG [main] bustools binary located at /usr/local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2024-08-09 19:37:46,179]   DEBUG [main] Creating `kb/1122/tmp` directory
[2024-08-09 19:37:46,182]   DEBUG [main] Namespace(N=None, aa=False, batch_barcodes=False, bootstraps=None, bustools='/usr/local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools', c1='kb_index/cdna_t2c.txt', c2='kb_index/intron_t2c.txt', cellranger=False, chromosomes=None, command='count', dry_run=False, em=False, fastqs=['/rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R1_001.fastq.gz', '/rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R2_001.fastq.gz'], filter='bustools', filter_threshold=None, fragment_l=None, fragment_s=None, g='kb_index/t2g.txt', gene_names=True, genomebam=False, gtf=None, h5ad=True, i='kb_index/index.idx', inleaved=False, kallisto='/usr/local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto', keep_tmp=False, list=False, loom=False, loom_names='barcode,target_name', m='4G', matrix_to_directories=False, matrix_to_files=False, mm=False, no_fragment=False, no_inspect=False, no_validate=False, num=False, o='kb/1122', overwrite=True, parity=None, r=None, report=False, strand=None, sum='total', t=8, tcc=False, tmp=None, verbose=True, w=None, workflow='nac', x='10xv3')
[2024-08-09 19:37:48,862]    INFO [count_nac] Using index kb_index/index.idx to generate BUS file to kb/1122 from
[2024-08-09 19:37:48,862]    INFO [count_nac]         /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R1_001.fastq.gz
[2024-08-09 19:37:48,862]    INFO [count_nac]         /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R2_001.fastq.gz
[2024-08-09 19:37:48,862]   DEBUG [count_nac] kallisto bus -i kb_index/index.idx -o kb/1122 -x 10xv3 -t 8 /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R1_001.fastq.gz /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R2_001.fastq.gz
[2024-08-09 19:37:48,967]   DEBUG [count_nac]
[2024-08-09 19:37:48,967]   DEBUG [count_nac] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2024-08-09 19:38:41,789]   DEBUG [count_nac] [index] k-mer length: 31
[2024-08-09 19:39:20,366]   DEBUG [count_nac] [index] number of targets: 267,211
[2024-08-09 19:39:20,466]   DEBUG [count_nac] [index] number of k-mers: 1,563,276,810
[2024-08-09 19:39:20,466]   DEBUG [count_nac] [index] number of D-list k-mers: 10,106,781
[2024-08-09 19:39:20,466]   DEBUG [count_nac] [quant] will process sample 1: /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R1_001.fastq.gz
[2024-08-09 19:39:20,466]   DEBUG [count_nac] /rhome/naotok/bigdata/ADKnowledgePortal/MC_snRNA/fastq/1122_S1_R2_001.fastq.gz
[2024-08-09 19:39:28,081]   DEBUG [count_nac] [quant] finding pseudoalignments for the reads ...
[2024-08-09 19:39:33,894]   DEBUG [count_nac] [progress] 1M reads processed (76.1% mapped)
[2024-08-09 19:39:39,907]   DEBUG [count_nac] [progress] 2M reads processed (76.1% mapped)
[2024-08-09 19:39:46,519]   DEBUG [count_nac] [progress] 3M reads processed (76.1% mapped)
[2024-08-09 19:39:52,230]   DEBUG [count_nac] [progress] 4M reads processed (76.1% mapped)
[2024-08-09 19:39:58,342]   DEBUG [count_nac] [progress] 5M reads processed (76.1% mapped)
[2024-08-09 19:40:04,153]   DEBUG [count_nac] [progress] 6M reads processed (76.0% mapped)
[2024-08-09 19:40:09,864]   DEBUG [count_nac] [progress] 7M reads processed (76.0% mapped)
[2024-08-09 19:40:16,677]   DEBUG [count_nac] [progress] 8M reads processed (75.9% mapped)
[2024-08-09 19:40:22,288]   DEBUG [count_nac] [progress] 9M reads processed (75.9% mapped)
[2024-08-09 19:40:28,502]   DEBUG [count_nac] [progress] 10M reads processed (75.9% mapped)
[2024-08-09 19:40:34,015]   DEBUG [count_nac] [progress] 11M reads processed (75.9% mapped)
[2024-08-09 19:40:40,026]   DEBUG [count_nac] [progress] 12M reads processed (75.9% mapped)
[2024-08-09 19:40:46,338]   DEBUG [count_nac] [progress] 13M reads processed (75.9% mapped)
[2024-08-09 19:40:51,849]   DEBUG [count_nac] [progress] 14M reads processed (75.9% mapped)
[2024-08-09 19:40:57,860]   DEBUG [count_nac] [progress] 15M reads processed (75.9% mapped)
[2024-08-09 19:41:04,273]   DEBUG [count_nac] [progress] 16M reads processed (75.9% mapped)
[2024-08-09 19:41:09,784]   DEBUG [count_nac] [progress] 17M reads processed (75.9% mapped)
[2024-08-09 19:41:16,297]   DEBUG [count_nac] [progress] 18M reads processed (75.9% mapped)
[2024-08-09 19:41:21,909]   DEBUG [count_nac] [progress] 19M reads processed (75.9% mapped)
[2024-08-09 19:41:28,223]   DEBUG [count_nac] [progress] 20M reads processed (75.9% mapped)
[2024-08-09 19:41:33,934]   DEBUG [count_nac] [progress] 21M reads processed (75.8% mapped)
[2024-08-09 19:41:39,444]   DEBUG [count_nac] [progress] 22M reads processed (75.6% mapped)
[2024-08-09 19:41:45,054]   DEBUG [count_nac] [progress] 23M reads processed (75.5% mapped)
[2024-08-09 19:41:50,464]   DEBUG [count_nac] [progress] 24M reads processed (75.3% mapped)
[2024-08-09 19:41:56,679]   DEBUG [count_nac] [progress] 25M reads processed (75.2% mapped)
[2024-08-09 19:42:02,894]   DEBUG [count_nac] [progress] 26M reads processed (75.1% mapped)
[2024-08-09 19:42:08,105]   DEBUG [count_nac] [progress] 27M reads processed (75.0% mapped)
[2024-08-09 19:42:14,420]   DEBUG [count_nac] [progress] 28M reads processed (75.0% mapped)
[2024-08-09 19:42:20,230]   DEBUG [count_nac] [progress] 29M reads processed (74.9% mapped)
[2024-08-09 19:42:25,540]   DEBUG [count_nac] [progress] 30M reads processed (74.8% mapped)
[2024-08-09 19:42:31,354]   DEBUG [count_nac] [progress] 31M reads processed (74.7% mapped)
[2024-08-09 19:42:37,065]   DEBUG [count_nac] [progress] 32M reads processed (74.7% mapped)
[2024-08-09 19:42:42,876]   DEBUG [count_nac] [progress] 33M reads processed (74.7% mapped)
[2024-08-09 19:42:48,687]   DEBUG [count_nac] [progress] 34M reads processed (74.6% mapped)
[2024-08-09 19:42:54,899]   DEBUG [count_nac] [progress] 35M reads processed (74.6% mapped)
[2024-08-09 19:43:01,013]   DEBUG [count_nac] [progress] 37M reads processed (74.5% mapped)
[2024-08-09 19:43:06,323]   DEBUG [count_nac] [progress] 38M reads processed (74.5% mapped)
[2024-08-09 19:43:12,734]   DEBUG [count_nac] [progress] 39M reads processed (74.5% mapped)
[2024-08-09 19:43:18,445]   DEBUG [count_nac] [progress] 40M reads processed (74.5% mapped)
[2024-08-09 19:43:24,156]   DEBUG [count_nac] [progress] 41M reads processed (74.5% mapped)
[2024-08-09 19:43:30,970]   DEBUG [count_nac] [progress] 42M reads processed (74.6% mapped)
[2024-08-09 19:43:36,581]   DEBUG [count_nac] [progress] 43M reads processed (74.6% mapped)
[2024-08-09 19:43:42,394]   DEBUG [count_nac] [progress] 44M reads processed (74.7% mapped)
[2024-08-09 19:43:48,104]   DEBUG [count_nac] [progress] 45M reads processed (74.7% mapped)
[2024-08-09 19:43:54,318]   DEBUG [count_nac] [progress] 46M reads processed (74.8% mapped)
[2024-08-09 19:44:00,231]   DEBUG [count_nac] [progress] 47M reads processed (74.8% mapped)
[2024-08-09 19:44:06,342]   DEBUG [count_nac] [progress] 48M reads processed (74.9% mapped)
[2024-08-09 19:44:12,555]   DEBUG [count_nac] [progress] 49M reads processed (74.9% mapped)
[2024-08-09 19:44:18,167]   DEBUG [count_nac] [progress] 50M reads processed (75.0% mapped)
[2024-08-09 19:44:23,877]   DEBUG [count_nac] [progress] 51M reads processed (75.0% mapped)
[2024-08-09 19:44:30,490]   DEBUG [count_nac] [progress] 52M reads processed (75.0% mapped)
[2024-08-09 19:44:36,202]   DEBUG [count_nac] [progress] 53M reads processed (75.0% mapped)
[2024-08-09 19:44:41,717]   DEBUG [count_nac] [progress] 54M reads processed (75.1% mapped)
[2024-08-09 19:44:47,454]   DEBUG [count_nac] [progress] 55M reads processed (75.1% mapped)
[2024-08-09 19:44:53,189]   DEBUG [count_nac] [progress] 56M reads processed (75.2% mapped)
[2024-08-09 19:44:59,222]   DEBUG [count_nac] [progress] 57M reads processed (75.2% mapped)
[2024-08-09 19:45:04,754]   DEBUG [count_nac] [progress] 58M reads processed (75.2% mapped)
[2024-08-09 19:45:10,669]   DEBUG [count_nac] [progress] 59M reads processed (75.3% mapped)
[2024-08-09 19:45:16,389]   DEBUG [count_nac] [progress] 60M reads processed (75.3% mapped)
[2024-08-09 19:45:22,422]   DEBUG [count_nac] [progress] 61M reads processed (75.3% mapped)
[2024-08-09 19:45:28,344]   DEBUG [count_nac] [progress] 62M reads processed (75.3% mapped)
[2024-08-09 19:45:33,875]   DEBUG [count_nac] [progress] 63M reads processed (75.4% mapped)
[2024-08-09 19:45:39,509]   DEBUG [count_nac] [progress] 64M reads processed (75.4% mapped)
[2024-08-09 19:45:45,547]   DEBUG [count_nac] [progress] 65M reads processed (75.4% mapped)
[2024-08-09 19:45:51,182]   DEBUG [count_nac] [progress] 66M reads processed (75.4% mapped)
[2024-08-09 19:45:56,610]   DEBUG [count_nac] [progress] 67M reads processed (75.4% mapped)
[2024-08-09 19:46:02,341]   DEBUG [count_nac] [progress] 68M reads processed (75.4% mapped)
[2024-08-09 19:46:07,756]   DEBUG [count_nac] [progress] 69M reads processed (75.4% mapped)
[2024-08-09 19:46:13,168]   DEBUG [count_nac] [progress] 70M reads processed (75.3% mapped)
[2024-08-09 19:46:18,478]   DEBUG [count_nac] [progress] 71M reads processed (75.3% mapped)
[2024-08-09 19:46:24,005]   DEBUG [count_nac] [progress] 72M reads processed (75.3% mapped)
[2024-08-09 19:46:29,437]   DEBUG [count_nac] [progress] 74M reads processed (75.2% mapped)
[2024-08-09 19:46:35,876]   DEBUG [count_nac] [progress] 75M reads processed (75.2% mapped)
[2024-08-09 19:46:41,110]   DEBUG [count_nac] [progress] 76M reads processed (75.2% mapped)
[2024-08-09 19:46:46,634]   DEBUG [count_nac] [progress] 77M reads processed (75.2% mapped)
[2024-08-09 19:46:52,247]   DEBUG [count_nac] [progress] 78M reads processed (75.2% mapped)
[2024-08-09 19:46:57,675]   DEBUG [count_nac] [progress] 79M reads processed (75.1% mapped)
[2024-08-09 19:47:03,102]   DEBUG [count_nac] [progress] 80M reads processed (75.1% mapped)
[2024-08-09 19:47:09,031]   DEBUG [count_nac] [progress] 81M reads processed (75.1% mapped)
[2024-08-09 19:47:14,954]   DEBUG [count_nac] [progress] 82M reads processed (75.1% mapped)
[2024-08-09 19:47:20,568]   DEBUG [count_nac] [progress] 83M reads processed (75.1% mapped)
[2024-08-09 19:47:26,488]   DEBUG [count_nac] [progress] 84M reads processed (75.1% mapped)
[2024-08-09 19:47:32,300]   DEBUG [count_nac] [progress] 85M reads processed (75.1% mapped)
[2024-08-09 19:47:38,016]   DEBUG [count_nac] [progress] 86M reads processed (75.1% mapped)
[2024-08-09 19:47:43,436]   DEBUG [count_nac] [progress] 87M reads processed (75.1% mapped)
[2024-08-09 19:47:48,971]   DEBUG [count_nac] [progress] 88M reads processed (75.1% mapped)
[2024-08-09 19:47:54,402]   DEBUG [count_nac] [progress] 89M reads processed (75.1% mapped)
[2024-08-09 19:47:59,733]   DEBUG [count_nac] [progress] 90M reads processed (75.1% mapped)
[2024-08-09 19:48:05,468]   DEBUG [count_nac] [progress] 91M reads processed (75.1% mapped)
[2024-08-09 19:48:11,102]   DEBUG [count_nac] [progress] 92M reads processed (75.1% mapped)
[2024-08-09 19:48:17,120]   DEBUG [count_nac] [progress] 93M reads processed (75.1% mapped)
[2024-08-09 19:48:23,439]   DEBUG [count_nac] [progress] 94M reads processed (75.1% mapped)
[2024-08-09 19:48:29,555]   DEBUG [count_nac] [progress] 95M reads processed (75.1% mapped)
[2024-08-09 19:48:35,078]   DEBUG [count_nac] [progress] 96M reads processed (75.1% mapped)
[2024-08-09 19:48:40,603]   DEBUG [count_nac] [progress] 97M reads processed (75.2% mapped)
[2024-08-09 19:48:46,125]   DEBUG [count_nac] [progress] 98M reads processed (75.2% mapped)
[2024-08-09 19:48:51,753]   DEBUG [count_nac] [progress] 99M reads processed (75.2% mapped)
[2024-08-09 19:48:57,377]   DEBUG [count_nac] [progress] 100M reads processed (75.2% mapped)
[2024-08-09 19:49:02,801]   DEBUG [count_nac] [progress] 101M reads processed (75.2% mapped)
[2024-08-09 19:49:08,225]   DEBUG [count_nac] [progress] 102M reads processed (75.2% mapped)
[2024-08-09 19:49:14,853]   DEBUG [count_nac] [progress] 103M reads processed (75.2% mapped)
[2024-08-09 19:49:20,279]   DEBUG [count_nac] [progress] 104M reads processed (75.3% mapped)
[2024-08-09 19:49:25,804]   DEBUG [count_nac] [progress] 105M reads processed (75.3% mapped)
[2024-08-09 19:49:31,419]   DEBUG [count_nac] [progress] 106M reads processed (75.3% mapped)
[2024-08-09 19:49:36,829]   DEBUG [count_nac] [progress] 107M reads processed (75.3% mapped)
[2024-08-09 19:49:43,641]   DEBUG [count_nac] [progress] 108M reads processed (75.3% mapped)
[2024-08-09 19:49:49,252]   DEBUG [count_nac] [progress] 109M reads processed (75.3% mapped)
[2024-08-09 19:49:54,762]   DEBUG [count_nac] [progress] 111M reads processed (75.4% mapped)
[2024-08-09 19:50:01,375]   DEBUG [count_nac] [progress] 112M reads processed (75.4% mapped)
[2024-08-09 19:50:06,585]   DEBUG [count_nac] [progress] 113M reads processed (75.4% mapped)
[2024-08-09 19:50:11,995]   DEBUG [count_nac] [progress] 114M reads processed (75.4% mapped)
[2024-08-09 19:50:17,204]   DEBUG [count_nac] [progress] 115M reads processed (75.3% mapped)
[2024-08-09 19:50:23,517]   DEBUG [count_nac] [progress] 116M reads processed (75.3% mapped)
[2024-08-09 19:50:29,229]   DEBUG [count_nac] [progress] 117M reads processed (75.3% mapped)
[2024-08-09 19:50:34,639]   DEBUG [count_nac] [progress] 118M reads processed (75.3% mapped)
[2024-08-09 19:50:41,051]   DEBUG [count_nac] [progress] 119M reads processed (75.3% mapped)
[2024-08-09 19:50:46,160]   DEBUG [count_nac] [progress] 120M reads processed (75.2% mapped)
[2024-08-09 19:50:51,771]   DEBUG [count_nac] [progress] 121M reads processed (75.2% mapped)
[2024-08-09 19:50:57,982]   DEBUG [count_nac] [progress] 122M reads processed (75.2% mapped)
[2024-08-09 19:51:03,293]   DEBUG [count_nac] [progress] 123M reads processed (75.2% mapped)
[2024-08-09 19:51:08,904]   DEBUG [count_nac] [progress] 124M reads processed (75.2% mapped)
[2024-08-09 19:51:14,515]   DEBUG [count_nac] [progress] 125M reads processed (75.2% mapped)
[2024-08-09 19:51:20,626]   DEBUG [count_nac] [progress] 126M reads processed (75.2% mapped)
[2024-08-09 19:51:26,037]   DEBUG [count_nac] [progress] 127M reads processed (75.2% mapped)
[2024-08-09 19:51:32,148]   DEBUG [count_nac] [progress] 128M reads processed (75.2% mapped)
[2024-08-09 19:51:37,959]   DEBUG [count_nac] [progress] 129M reads processed (75.2% mapped)
[2024-08-09 19:51:43,369]   DEBUG [count_nac] [progress] 130M reads processed (75.2% mapped)
[2024-08-09 19:51:49,380]   DEBUG [count_nac] [progress] 131M reads processed (75.2% mapped)
[2024-08-09 19:51:55,291]   DEBUG [count_nac] [progress] 132M reads processed (75.2% mapped)
[2024-08-09 19:52:01,104]   DEBUG [count_nac] [progress] 133M reads processed (75.2% mapped)
[2024-08-09 19:52:06,617]   DEBUG [count_nac] [progress] 134M reads processed (75.2% mapped)
[2024-08-09 19:52:13,029]   DEBUG [count_nac] [progress] 135M reads processed (75.2% mapped)
[2024-08-09 19:52:18,940]   DEBUG [count_nac] [progress] 136M reads processed (75.2% mapped)
[2024-08-09 19:52:24,551]   DEBUG [count_nac] [progress] 137M reads processed (75.2% mapped)
[2024-08-09 19:52:30,764]   DEBUG [count_nac] [progress] 138M reads processed (75.2% mapped)
[2024-08-09 19:52:36,374]   DEBUG [count_nac] [progress] 139M reads processed (75.2% mapped)
[2024-08-09 19:52:42,285]   DEBUG [count_nac] [progress] 140M reads processed (75.3% mapped)
[2024-08-09 19:52:48,497]   DEBUG [count_nac] [progress] 141M reads processed (75.3% mapped)
[2024-08-09 19:52:54,308]   DEBUG [count_nac] [progress] 142M reads processed (75.3% mapped)
[2024-08-09 19:53:00,221]   DEBUG [count_nac] [progress] 143M reads processed (75.3% mapped)
[2024-08-09 19:53:05,937]   DEBUG [count_nac] [progress] 144M reads processed (75.3% mapped)
[2024-08-09 19:53:12,148]   DEBUG [count_nac] [progress] 145M reads processed (75.3% mapped)
[2024-08-09 19:53:18,159]   DEBUG [count_nac] [progress] 147M reads processed (75.3% mapped)
[2024-08-09 19:53:24,170]   DEBUG [count_nac] [progress] 148M reads processed (75.3% mapped)
[2024-08-09 19:53:29,883]   DEBUG [count_nac] [progress] 149M reads processed (75.3% mapped)
[2024-08-09 19:53:35,795]   DEBUG [count_nac] [progress] 150M reads processed (75.3% mapped)
[2024-08-09 19:53:41,106]   DEBUG [count_nac] [progress] 151M reads processed (75.3% mapped)
[2024-08-09 19:53:46,618]   DEBUG [count_nac] [progress] 152M reads processed (75.3% mapped)
[2024-08-09 19:53:52,531]   DEBUG [count_nac] [progress] 153M reads processed (75.3% mapped)
[2024-08-09 19:53:58,243]   DEBUG [count_nac] [progress] 154M reads processed (75.2% mapped)
[2024-08-09 19:54:03,754]   DEBUG [count_nac] [progress] 155M reads processed (75.2% mapped)
[2024-08-09 19:54:09,364]   DEBUG [count_nac] [progress] 156M reads processed (75.2% mapped)
[2024-08-09 19:54:15,290]   DEBUG [count_nac] [progress] 157M reads processed (75.2% mapped)
[2024-08-09 19:54:21,012]   DEBUG [count_nac] [progress] 158M reads processed (75.2% mapped)
[2024-08-09 19:54:26,623]   DEBUG [count_nac] [progress] 159M reads processed (75.2% mapped)
[2024-08-09 19:54:32,534]   DEBUG [count_nac] [progress] 160M reads processed (75.1% mapped)
[2024-08-09 19:54:37,947]   DEBUG [count_nac] [progress] 161M reads processed (75.1% mapped)
[2024-08-09 19:54:43,958]   DEBUG [count_nac] [progress] 162M reads processed (75.1% mapped)
[2024-08-09 19:54:49,869]   DEBUG [count_nac] [progress] 163M reads processed (75.1% mapped)
[2024-08-09 19:54:55,279]   DEBUG [count_nac] [progress] 164M reads processed (75.1% mapped)
[2024-08-09 19:55:01,695]   DEBUG [count_nac] [progress] 165M reads processed (75.1% mapped)
[2024-08-09 19:55:06,002]   DEBUG [count_nac] [progress] 166M reads processed (75.1% mapped)              done
[2024-08-09 19:55:06,002]   DEBUG [count_nac] [quant] processed 166,938,066 reads, 125,327,006 reads pseudoaligned
[2024-08-09 19:55:07,805]   DEBUG [count_nac]
[2024-08-09 19:55:17,522]    INFO [count_nac] Sorting BUS file kb/1122/output.bus to kb/1122/tmp/output.s.bus
[2024-08-09 19:55:17,522]   DEBUG [count_nac] bustools sort -o kb/1122/tmp/output.s.bus -T kb/1122/tmp -t 8 -m 4G kb/1122/output.bus
[2024-08-09 19:55:20,532]   DEBUG [count_nac] partition time: 1.79815s
[2024-08-09 19:55:23,458]   DEBUG [count_nac] all fits in buffer
[2024-08-09 19:55:25,360]   DEBUG [count_nac] Read in 125327006 BUS records
[2024-08-09 19:55:25,360]   DEBUG [count_nac] reading time 0.366752s
[2024-08-09 19:55:25,361]   DEBUG [count_nac] sorting time 19.3618s
[2024-08-09 19:55:25,361]   DEBUG [count_nac] writing time 0.902203s
[2024-08-09 19:55:25,361]    INFO [count_nac] On-list not provided
[2024-08-09 19:55:25,361]    INFO [count_nac] Copying pre-packaged 10XV3 on-list to kb/1122
[2024-08-09 19:55:25,971]    INFO [count_nac] Inspecting BUS file kb/1122/tmp/output.s.bus
[2024-08-09 19:55:25,971]   DEBUG [count_nac] bustools inspect -o kb/1122/inspect.json -w kb/1122/10x_version3_whitelist.txt kb/1122/tmp/output.s.bus
[2024-08-09 19:55:38,995]    INFO [count_nac] Correcting BUS records in kb/1122/tmp/output.s.bus to kb/1122/tmp/output.s.c.bus with on-list kb/1122/10x_version3_whitelist.txt
[2024-08-09 19:55:38,996]   DEBUG [count_nac] bustools correct -o kb/1122/tmp/output.s.c.bus -w kb/1122/10x_version3_whitelist.txt kb/1122/tmp/output.s.bus
[2024-08-09 19:55:43,609]   DEBUG [count_nac] Found 6794880 barcodes in the on-list
[2024-08-09 19:55:49,917]   DEBUG [count_nac] Processed 73699215 BUS records
[2024-08-09 19:55:49,918]   DEBUG [count_nac] In on-list = 71283991
[2024-08-09 19:55:49,918]   DEBUG [count_nac] Corrected    = 320120
[2024-08-09 19:55:49,918]   DEBUG [count_nac] Uncorrected  = 2095104
[2024-08-09 19:55:51,320]    INFO [count_nac] Sorting BUS file kb/1122/tmp/output.s.c.bus to kb/1122/output.unfiltered.bus
[2024-08-09 19:55:51,320]   DEBUG [count_nac] bustools sort -o kb/1122/output.unfiltered.bus -T kb/1122/tmp -t 8 -m 4G kb/1122/tmp/output.s.c.bus
[2024-08-09 19:55:54,032]   DEBUG [count_nac] partition time: 0.307215s
[2024-08-09 19:55:55,647]   DEBUG [count_nac] all fits in buffer
[2024-08-09 19:55:57,349]   DEBUG [count_nac] Read in 71604111 BUS records
[2024-08-09 19:55:57,350]   DEBUG [count_nac] reading time 0.222271s
[2024-08-09 19:55:57,350]   DEBUG [count_nac] sorting time 9.37299s
[2024-08-09 19:55:57,350]   DEBUG [count_nac] writing time 0.68096s
[2024-08-09 19:55:57,350]    INFO [count_nac] Generating count matrix kb/1122/counts_unfiltered/cells_x_genes from BUS file kb/1122/output.unfiltered.bus
[2024-08-09 19:55:57,350]   DEBUG [count_nac] bustools count -o kb/1122/counts_unfiltered/cells_x_genes -g kb_index/t2g.txt -e kb/1122/matrix.ec -t kb/1122/transcripts.txt -s kb_index/intron_t2c.txt --genecounts --umi-gene kb/1122/output.unfiltered.bus
[2024-08-09 19:57:29,105]   DEBUG [count_nac] kb/1122/counts_unfiltered/cells_x_genes.mature.mtx passed validation
[2024-08-09 19:57:43,330]   DEBUG [count_nac] kb/1122/counts_unfiltered/cells_x_genes.nascent.mtx passed validation
[2024-08-09 19:57:52,270]   DEBUG [count_nac] kb/1122/counts_unfiltered/cells_x_genes.ambiguous.mtx passed validation
[2024-08-09 19:57:52,270]    INFO [count_nac] Writing gene names to file kb/1122/counts_unfiltered/cells_x_genes.genes.names.txt
[2024-08-09 19:57:52,514] WARNING [count_nac] 13914 gene IDs do not have corresponding valid gene names. These genes will use their gene IDs instead.
[2024-08-09 19:57:52,567]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.cell.mtx
[2024-08-09 19:58:02,525]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.cell.mtx
[2024-08-09 19:58:16,038]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.nucleus.mtx
[2024-08-09 19:58:40,891]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.nucleus.mtx
[2024-08-09 19:59:13,397]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.total.mtx
[2024-08-09 19:59:34,874]    INFO [count_nac] Summing matrices into kb/1122/counts_unfiltered/cells_x_genes.total.mtx
[2024-08-09 20:00:03,929]    INFO [count_nac] Reading matrix kb/1122/counts_unfiltered/cells_x_genes.mature.mtx
[2024-08-09 20:00:05,506]    INFO [count_nac] Reading matrix kb/1122/counts_unfiltered/cells_x_genes.nascent.mtx
[2024-08-09 20:00:20,763]    INFO [count_nac] Reading matrix kb/1122/counts_unfiltered/cells_x_genes.ambiguous.mtx
[2024-08-09 20:00:30,804]    INFO [count_nac] Combining matrices
[2024-08-09 20:00:31,330]    INFO [count_nac] Writing matrices to h5ad kb/1122/counts_unfiltered/adata.h5ad
[2024-08-09 20:00:31,692]    INFO [count_nac] Filtering with bustools
[2024-08-09 20:00:31,692]    INFO [count_nac] Generating on-list kb/1122/filter_barcodes.txt from BUS file kb/1122/output.unfiltered.bus
[2024-08-09 20:00:31,692]   DEBUG [count_nac] bustools allowlist -o kb/1122/filter_barcodes.txt kb/1122/output.unfiltered.bus
[2024-08-09 20:00:33,409]   DEBUG [count_nac] Read in 71540002 BUS records, wrote 5138 barcodes to on-list with threshold 2108
[2024-08-09 20:00:33,410]    INFO [count_nac] Correcting BUS records in kb/1122/output.unfiltered.bus to kb/1122/tmp/output.unfiltered.c.bus with on-list kb/1122/filter_barcodes.txt
[2024-08-09 20:00:33,410]   DEBUG [count_nac] bustools correct -o kb/1122/tmp/output.unfiltered.c.bus -w kb/1122/filter_barcodes.txt kb/1122/output.unfiltered.bus
[2024-08-09 20:00:33,521]   DEBUG [count_nac] Found 5137 barcodes in the on-list
[2024-08-09 20:00:38,327]   DEBUG [count_nac] Processed 71540002 BUS records
[2024-08-09 20:00:38,327]   DEBUG [count_nac] In on-list = 64193659
[2024-08-09 20:00:38,328]   DEBUG [count_nac] Corrected    = 20751
[2024-08-09 20:00:38,328]   DEBUG [count_nac] Uncorrected  = 7325592
[2024-08-09 20:00:38,328]    INFO [count_nac] Sorting BUS file kb/1122/tmp/output.unfiltered.c.bus to kb/1122/output.filtered.bus
[2024-08-09 20:00:38,328]   DEBUG [count_nac] bustools sort -o kb/1122/output.filtered.bus -T kb/1122/tmp -t 8 -m 4G kb/1122/tmp/output.unfiltered.c.bus
[2024-08-09 20:00:42,549]   DEBUG [count_nac] partition time: 0.281753s
[2024-08-09 20:00:43,890]   DEBUG [count_nac] all fits in buffer
[2024-08-09 20:00:45,593]   DEBUG [count_nac] Read in 64214410 BUS records
[2024-08-09 20:00:45,593]   DEBUG [count_nac] reading time 0.195412s
[2024-08-09 20:00:45,593]   DEBUG [count_nac] sorting time 6.38335s
[2024-08-09 20:00:45,593]   DEBUG [count_nac] writing time 0.607916s
[2024-08-09 20:00:45,593]    INFO [count_nac] Generating count matrix kb/1122/counts_filtered/cells_x_genes from BUS file kb/1122/output.filtered.bus
[2024-08-09 20:00:45,593]   DEBUG [count_nac] bustools count -o kb/1122/counts_filtered/cells_x_genes -g kb_index/t2g.txt -e kb/1122/matrix.ec -t kb/1122/transcripts.txt -s kb_index/intron_t2c.txt --genecounts --umi-gene kb/1122/output.filtered.bus
[2024-08-09 20:02:13,708]   DEBUG [count_nac] kb/1122/counts_filtered/cells_x_genes.mature.mtx passed validation
[2024-08-09 20:02:25,628]   DEBUG [count_nac] kb/1122/counts_filtered/cells_x_genes.nascent.mtx passed validation
[2024-08-09 20:02:32,554]   DEBUG [count_nac] kb/1122/counts_filtered/cells_x_genes.ambiguous.mtx passed validation
[2024-08-09 20:02:32,554]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.cell.mtx
[2024-08-09 20:02:39,915]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.cell.mtx
[2024-08-09 20:02:50,039]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.nucleus.mtx
[2024-08-09 20:03:08,810]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.nucleus.mtx
[2024-08-09 20:03:33,332]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.total.mtx
[2024-08-09 20:03:49,526]    INFO [count_nac] Summing matrices into kb/1122/counts_filtered/cells_x_genes.total.mtx
[2024-08-09 20:04:11,521]    INFO [count_nac] Reading matrix kb/1122/counts_filtered/cells_x_genes.mature.mtx
[2024-08-09 20:04:12,716]    INFO [count_nac] Reading matrix kb/1122/counts_filtered/cells_x_genes.nascent.mtx
[2024-08-09 20:04:24,785]    INFO [count_nac] Reading matrix kb/1122/counts_filtered/cells_x_genes.ambiguous.mtx
[2024-08-09 20:04:32,155]    INFO [count_nac] Combining matrices
[2024-08-09 20:04:32,322]    INFO [count_nac] Writing matrices to h5ad kb/1122/counts_filtered/adata.h5ad
[2024-08-09 20:04:32,458]   DEBUG [main] Removing `kb/1122/tmp` directory
Yenaled commented 2 months ago

You are using the nac workflow whereby the UMIs are split across different layers of the adata object (mature, nascent, ambiguous). Summing across the anndatas layers will get you to the total mtx file.

NaotoKubota commented 2 months ago

Thank you for your quick response, that makes sense. In this case, what is stored in adata.X?

Yenaled commented 2 months ago

I think it only contains the “mature” layer, but I’ll have to double check. In any case, it’ll probably be good for us to change it to the “total” matrix in future releases.

NaotoKubota commented 2 months ago

I confirmed my data and the adata.X seems to be identical to the "mature" layer, please double-check on your end. It would be more intuitive to have the "total" matrix stored in adata.X by default. Thank you very much, @Yenaled !