Closed Rajesh-HKU closed 6 months ago
Hi Rajesh, thank you for reaching out.
Could you please move this issue to the kallisto repository and tell us a bit more about what you did (provide the exact commands that were run and the kallisto version)?
Best,
Laura
Dear Laura,
Thanks for your response. The details are as follows:
Software versions: Kb_python 0.28.2 Kallisto 0.48
Count Matrix Commands: The code reads a list of fastq files and creates a count matrix of either genes or tcc associated with each specimen and saves the file with the specimen prefix.
Gene count matrix :
1. system(paste("kb count -i index.idx -g t2g.txt -x 10xv2 -t 2 --overwrite", paste(fastq_fns[(2i-1):(2i)], collapse = " "), sep = " "), intern = TRUE) 2. my_files <- list.files(path = "./counts_unfiltered",pattern = "^cells_x_genes") 3. file.copy(from = paste0("./counts_unfiltered/", my_files), to = paste0("./cellout/unfcounts/",specimens[2*i],"", my_files))
Tcc_count matrix:
1. system(paste("kb count -i index.idx -g t2g.txt -x 10xv2 -t2 --tcc --overwrite", paste(fastq_fns[(2i-1):(2i)], collapse = " "), sep = " "), intern = TRUE) 2. my_files <- list.files(path = "./counts_unfiltered",pattern = "^cells_x_tcc") 3. file.copy(from = paste0("./counts_unfiltered/", my_files), to=paste0("./cell_out/unfcountstcc/",specimendata$specimenID[i],"", my_files))
For the purpose of comparison of a particular gene vs tcc count, we only included the equivalent classes which had one gene mapping only.
1. lib <- lib <- specimendata$specimenID[i] 2. count_file <- paste0(lib,"_cells_x_genes") 3. rowbin1 <- which(grepl(pattern = "^ENSG00000136717",rownames(res_matx))) #row 7729 4. res_mat <- read_count_output(dir = "./cell_out/unfcounts_tcc/", name = count_file, tcc = TRUE) 5. totalgene <- res_mat[rowbin1_tcc,]
1. lib <- lib <- specimendata$specimenID[i] 2. count_file <- paste0(lib,"_cells_x_tcc") 3. rowbin1_tcc <- which(grepl(pattern = "\tBIN1$",t2g$V1)) #rows 28581 to 28594 4. res_mat <- read_count_output(dir = "./cell_out/unfcounts_tcc/", name = count_file, tcc = TRUE) 5. res_matx_tcc <- res_mat[rowbin1_tcc,] 6.
7. totalbin1 <- colSums(as.matrix(res_matx_tcc))
table(totalgene,totalbin1)
totalbin1 (sum of count of 14 transcripts) 0 1 2 3 4 5 6
0 6877 933 151 29 15 1 0
1 1164 174 30 6 1 0 0
2 182 32 3 0 0 0 1
3 26 3 0 1 0 0 0
4 3 0 0 0 0 0 0
Please note above that when total of transcripts is >=1, the gene count is still predominantly zero.
Thanks.
Best regards, Rajesh
From: Laura Luebbert @.> Sent: Monday, April 15, 2024 9:43 AM To: pachterlab/gget_examples @.> Cc: Rajesh-HKU @.>; Author @.> Subject: Re: [pachterlab/gget_examples] Kallisto gene vs tcc counts (Issue #6)
Hi Rajesh, thank you for reaching out. I'm going to move this issue to the kallisto repositoryhttps://github.com/pachterlab/kallisto.
Could you please tell us more about what you did (also provide the exact commands that were run and the kallisto version)?
Best,
Laura
— Reply to this email directly, view it on GitHubhttps://github.com/pachterlab/gget_examples/issues/6#issuecomment-2054291512, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BHXGS7FG6M6P3AJBMFFS2PTY5MWCPAVCNFSM6AAAAABGALUAAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJUGI4TCNJRGI. You are receiving this because you authored the thread.
Dear team, I compared the gene and the tcc count for the same dataset after running kallisto on a set of fastq files. I find big discrepancies between the two counts. The gene count =0 even when the tcc counts >=1 in a vast majority of cases. I have only considered tcc counts where the equivalence class contains exactly one transcript corresponding to the gene. Can you please help me understand why this maybe happening?
Thanks.
Best regards, Rajesh