mritchielab / FLAMES

A framework for performing single-cell and bulk read full-length analysis of mutations and splicing.
https://mritchielab.github.io/FLAMES/
GNU General Public License v3.0
20 stars 9 forks source link

do_gene_quantification fails in sc_long_pipeline #50

Open beeesal opened 1 month ago

beeesal commented 1 month ago

Hey! This is a great pipeline. I have been using sc_long_pipeline function and the gene quantification step fails. However, when I run this function with do_gene_quantification = FALSE in my configuration file, this step is skipped and the steps after this work.

This is the error that I get:

10:02:37 Tue Oct 22 2024 Start gene quantification and UMI deduplication 10:02:37 Tue Oct 22 2024 quantify genes Found genome alignment file(s): align2genome.bam No protocol specified No protocol specified Assigning reads to genes... Processed: 0%| | 1/20974 [00:00<3:14:58, 1.79gene_group/s] Error in value[3L] : Error when quantifying genes: Traceback (most recent call last): File "path to/FLAMES/python/count_gene.py", line 486, in quantification quantify_gene(in_bam, annotation, n_process) File "path to/FLAMES/python/count_gene.py", line 216, in quantify_gene gene_count_mat, dedup_read_lst_sub, umi_list_sub = future.result() ValueError: too many values to unpack (expected 3)

Also, when i set bambu_isoform_identification = TRUE in my config file, I get output files. But when I try to set bambu_isoform_identification = FALSE and run the function, it fails with the following error: Error: _Map_base::at

ChangqingW commented 1 month ago

I believe the ValueError has been fix on the Github devel branch, could you do BiocManager::install("mritchielab/FLAMES") and see if the error persist? You might also want to install the devel version of basilisk and basilisk.utils, either through bioconductor or devtools::install_github('Bioconductor/basilisk.utils') and devtools::install_github('Bioconductor/basilisk')

beeesal commented 4 weeks ago

I tried these steps and am encountering the same value error.

Found genome alignment file(s): align2genome.bam Error in value[3L] : Error when quantifying genes: Traceback (most recent call last): File "/home/R/x86_64-pc-linux-gnu-library/4.4/reticulate/python/rpytools/loader.py", line 122, in _find_and_load_hook return _run_hook(name, _hook) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/R/x86_64-pc-linux-gnu-library/4.4/reticulate/python/rpytools/loader.py", line 96, in _run_hook module = hook() ^^^^^^ File "/home/R/x86_64-pc-linux-gnu-library/4.4/reticulate/python/rpytools/loader.py", line 120, in _hook return _find_andload(name, import) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/R/x86_64-pc-linux-gnu-library/4.4/FLAMES/python/count_gene.py", line 8, in import pandas as pd File "/home/R/x86_64-pc-linux-gnu-library/4.4/reticulate/python/rpytools/loader.py", line 122, in _find_and_load_hook return _run_hook(name, _hook) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/R/x86_64-pc-linux-gnu-library

ChangqingW commented 4 weeks ago

Could you post the full error message? The last line does not look complete.

beeesal commented 3 weeks ago

I agree. I tried running this a couple of times but every time I get the same error and it ends the way shown above

ChangqingW commented 3 weeks ago

The Error: _Map_base::at is a known error we are trying to fix.

For the reticulate error, could you run traceback() after encountering the error and post the output? And sessionInfo() might also help.

beeesal commented 2 weeks ago

Thanks for your prompt response.

Posted the error below.

Also, while quantifying transcripts (oarfish) is taking too long to run (24hours +) and I have to terminate it. I tried it out on a subset of my data so I'm working with a fastq of 600MB and a BAM file of around the same size. I did not have this issue with the previous version of FLAMES I used. The isoforms generated from this pipeline are fine, it'd be great if the accompanying gene counts and transcript counts generate too. Thanks :)

traceback() 7: stop("Error when quantifying genes:\n", py_error_message) 6: value[3L] 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 3: tryCatch({ basiliskRun(env = flames_env, fun = function(annotation, outdir, pipeline, n_process, infq, samples, random_seed) { python_path <- system.file("python", package = "FLAMES") count <- reticulate::import_from_path("count_gene", python_path) count$quantification(annotation, outdir, pipeline, n_process, infq = infq, sample_names = samples, random_seed = random_seed) }, annotation = annotation, outdir = outdir, pipeline = pipeline, n_process = n_process, infq = infq, samples = samples, random_seed = random_seed) }, error = function(e) { py_error <- reticulate::py_last_error() if (!is.null(py_error)) { py_error_message <- py_error$message cat(annotation, outdir, pipeline, n_process, infq, samples, random_seed) stop("Error when quantifying genes:\n", py_error_message) } else { stop("Error when quantifying genes:\n", e$message) ... 2: quantify_gene(annotation, outdir, infq, config$pipeline_parameters$threads, pipeline = "sc_single_sample", random_seed = random_seed) 1: sc_long_pipeline(annotation = gtf_anno_genome, fastq = ffq, outdir = "/bioinformatics/sample_files/", genome_fa = fasta_genome, minimap2 = NULL, k8 = "/home/k8-0.2.4/k8-Linux", config_file = config_path)

youyupei commented 2 weeks ago

Hi @beeesal, do you think you have the latest version of FLAMES? I think we have update the gene quantification very recently. Reinstall flames via BiocManager::install("mritchielab/FLAMES") might be helpful.