kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
164 stars 23 forks source link

`Error in filter()` when calling `run_numbat` #51

Closed aseyedia closed 2 months ago

aseyedia commented 1 year ago

Hello,

I am having an issue with the samples that were able to successfully work and produce allele counts with pileup_and_phase.R.

I call the container using singularity like so:

singularity run \
   --bind /mnt/isilon/ \
   --cleanenv -H /mnt/isilon/cscb/xxx/code/04b_NumBat \
   /mnt/isilon/cscb/xxx/code/04b_NumBat/numbat-rbase_latest.sif

and I enter an interactive R session. Following the Getting Started guide, I try to supply run_numbat with the gene x cell counts matrix and the allele counts, but regardless of the sample I use, I keep getting the same error:

counts <- Matrix::readMM("/mnt/isilon/cscb/xxx/code/02-counts/MCD1-redo_wIntrons-force/outs/filtered_feature_bc_matrix/matrix.mtx.gz")

alleles <- read.table("/mnt/isilon/cscb/Projects/10x/xxx/04b-NumBat/MCD1/MCD1_allele_counts.tsv.gz", sep="\t")

out = run_numbat(
    as.matrix(counts), # gene x cell integer UMI count matrix
    ref_hca, # reference expression profile, a gene x cell type normalized expression level matrix
    alleles, # allele dataframe generated by pileup_and_phase script
    genome = "hg38",
    t = 1e-5,
    ncores = 1,
    plot = TRUE,
    out_dir = './test'
)

> Error in `filter()`:
> ! Problem while computing `..1 = GT != ""`.
> Caused by error in `mask$eval_all_filter()`:
> ! object 'GT' not found
> Run `rlang::last_error()` to see where the error occurred.

I wish I had more information to give you but the only other thing I can say is that some of these samples were unable to work with pileup_and_phase.R and produce the allele counts gz.

Here are the bam and barcode files I used for each sample:

/mnt/isilon/cscb/Projects/10x/xxx/MCD1-redo_wIntrons-force/outs/possorted_genome_bam.bam

/mnt/isilon/cscb/xxx/MCD1-redo_wIntrons-force/outs/filtered_feature_bc_matrix/barcodes.tsv.gz

Do you have any advice on how to address this problem? The samples I'm asking about in this issue specifically are scRNA-seq samples processed by 10x CellRanger.

teng-gao commented 1 year ago

Hi @aseyedia ,

Thanks for the issue - could you head your allele dataframe to see what it looks like?

aseyedia commented 1 year ago

Sure.

(base) [seyediana@reslnvvhpc041 04b-NumBat]$ zcat MCD1/MCD1_allele_counts.tsv.gz | head -n 50
cell    snp_id  CHROM   POS     cM      REF     ALT     AD      DP      GT      gene
TACCGAATCCGCTTAC-1      1_821000_T_A    1       821000  0.48995214890403        T       A       0       1       0|1
TCTTCCTGTTACCCAA-1      1_821000_T_A    1       821000  0.48995214890403        T       A       1       1       0|1
AGACACTCAGCCGTCA-1      1_823246_C_T    1       823246  0.48995214890403        C       T       0       1       0|1
GAACACTTCAGACCGC-1      1_823246_C_T    1       823246  0.48995214890403        C       T       1       1       0|1
GGATCTACACTTGAAC-1      1_823246_C_T    1       823246  0.48995214890403        C       T       1       1       0|1
TACCGAATCCGCTTAC-1      1_823246_C_T    1       823246  0.48995214890403        C       T       0       1       0|1
TATTGCTTCGAAATCC-1      1_823246_C_T    1       823246  0.48995214890403        C       T       1       1       0|1
TTGACCCTCTACACTT-1      1_823246_C_T    1       823246  0.48995214890403        C       T       1       1       0|1
AGGGCCTCAACAGTGG-1      1_946247_G_A    1       946247  0.620826529104498       G       A       0       1       0|1     NOC2L
GAGTTGTCAACTGGTT-1      1_946247_G_A    1       946247  0.620826529104498       G       A       1       1       0|1     NOC2L
GGTGAAGCAAACACCT-1      1_946247_G_A    1       946247  0.620826529104498       G       A       0       1       0|1     NOC2L
GTGTGGCAGTACTGTC-1      1_946247_G_A    1       946247  0.620826529104498       G       A       2       2       0|1     NOC2L
GCTACAATCCACCTCA-1      1_954724_G_A    1       954724  0.620826529104498       G       A       1       1       1|0     NOC2L
GTCTGTCGTAGAGCTG-1      1_954724_G_A    1       954724  0.620826529104498       G       A       0       1       1|0     NOC2L
CTCATTAGTCCCACGA-1      1_955679_C_T    1       955679  0.620826529104498       C       T       0       1       0|1     NOC2L
GGAGGATCAAGGCCTC-1      1_955679_C_T    1       955679  0.620826529104498       C       T       1       1       0|1     NOC2L
CACTGTCAGTGGCCTC-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
CATACCCCAAGTATCC-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
CTAACCCAGCTCGCAC-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
CTGGACGTCATGGCCG-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
GTTTACTCATTGAAGA-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
TAATTCCGTGCTTCAA-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
TCAGTCCTCAGTGGGA-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
TTCATGTAGTTGTAAG-1      1_958339_G_A    1       958339  0.620826529104498       G       A       1       1       0|1     NOC2L
TTCGGTCGTACGACAG-1      1_958339_G_A    1       958339  0.620826529104498       G       A       0       1       0|1     NOC2L
AAACGCTTCTTCCCAG-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
AGGACTTAGTCACTCA-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
CATCCGTCAGCAGATG-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
CATTGCCAGTCTGGTT-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
CTCCCAAGTTAATCGC-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       1       1       1|0
CTGCGAGAGCACCGAA-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       1       1       1|0
CTTCGGTAGTCACAGG-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
GAGTTACGTGTCCATA-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       1       1       1|0
GCCTGTTCATGGAGAC-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
GTACAACCACGGTGAA-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
GTAGGAGGTCGGTGTC-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
TCACGGGTCCTATTGT-1      1_1000731_C_T   1       1000731 0.620826529104498       C       T       0       1       1|0
GCCCAGAAGTCATAGA-1      1_1009184_T_C   1       1009184 0.620826529104498       T       C       0       1       0|1
TGCTTGCTCGACCTAA-1      1_1009184_T_C   1       1009184 0.620826529104498       T       C       1       1       0|1
AAAGGGCTCAGCCTCT-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
ACCATTTCAAAGCACG-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       0       1       0|1     ISG15
AGAACAAAGCGTTCAT-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       0       1       0|1     ISG15
AGAGCAGGTGATTGGG-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
AGGAGGTCAACGGTAG-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
AGGGTTTAGGTGCAGT-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
ATCACAGCATGAAGCG-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       0       1       0|1     ISG15
ATCCTATGTGCCGAAA-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
ATCGTGAGTGCCTTCT-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
ATGGGAGAGGTCATAA-1      1_1014228_G_A   1       1014228 0.759933699602133       G       A       1       1       0|1     ISG15
teng-gao commented 1 year ago

Seems right to me .. have you checked if you can run the ATC2 example in the interactive session & see what's different?

MilanParikh commented 1 year ago

I ran into the same issue, if anyone else does it's an easy solve. Just add header=TRUE when you're reading in the alleles file, and you may need to un gzip the file also.

This should work:

alleles <- read.table(gzfile("/mnt/isilon/cscb/Projects/10x/xxx/04b-NumBat/MCD1/MCD1_allele_counts.tsv.gz"), sep="\t", header=TRUE)