mritchielab / FLAMES

A framework for performing single-cell and bulk read full-length analysis of mutations and splicing.
https://mritchielab.github.io/FLAMES/
GNU General Public License v3.0
20 stars 9 forks source link

no transcript id #27

Closed nick-youngblut closed 7 months ago

nick-youngblut commented 7 months ago

I'm getting the following error:

#### Aligning reads to genome using minimap2
02:20:30 AM Thu Apr 04 2024 minimap2_align
Warning: error in running command/home/rstudio/miniconda3/bin/paftools.js:1714: Error: No transcript_id
        if (id == null) throw Error("No transcript_id");
                        ^
Error: No transcript_id
    at Error (<anonymous>)
    at paf_gff2bed (/home/rstudio/miniconda3/bin/paftools.js:1714:25)
    at main (/home/rstudio/miniconda3/bin/paftools.js:3695:29)
    at /home/rstudio/miniconda3/bin/paftools.js:3721:1

My relevant code:

config_file = FLAMES::create_config(
  work_dir, type = "sc_3end", do_barcode_demultiplex = FALSE
)

sce = sc_long_pipeline(
    fastq = fastq_input,
    genome_fa = ref_fna_input,
    annotation = ref_annot_input,
    outdir = work_dir,
    config = config_file,
    minimap2_dir = minimap2_path,
    expect_cell_number = 8000
)

For the reference/annotation files, I'm using:

The full input parameters:

#### Input parameters:
{
  "pipeline_parameters": {
    "seed": [2022],
    "threads": [1],
    "do_barcode_demultiplex": [false],
    "do_gene_quantification": [true],
    "do_genome_alignment": [true],
    "do_isoform_identification": [true],
    "bambu_isoform_identification": [false],
    "do_read_realignment": [true],
    "do_transcript_quantification": [true]
  },
  "barcode_parameters": {
    "max_bc_editdistance": [2],
    "max_flank_editdistance": [8],
    "pattern": {
      "primer": ["CTACACGACGCTCTTCCGATCT"],
      "BC": ["NNNNNNNNNNNNNNNN"],
      "UMI": ["NNNNNNNNNNNN"],
      "polyT": ["TTTTTTTTT"]
    },
    "TSO_seq": ["CCCATGTACTCTGCGTTGATACCACTGCTT"],
    "TSO_prime": [3],
    "full_length_only": [false]
  },
  "isoform_parameters": {
    "generate_raw_isoform": [false],
    "max_dist": [10],
    "max_ts_dist": [100],
    "max_splice_match_dist": [10],
    "min_fl_exon_len": [40],
    "max_site_per_splice": [3],
    "min_sup_cnt": [5],
    "min_cnt_pct": [0.001],
    "min_sup_pct": [0.2],
    "bambu_trust_reference": [true],
    "strand_specific": [0],
    "remove_incomp_reads": [4],
    "downsample_ratio": [1]
  },
  "alignment_parameters": {
    "use_junctions": [true],
    "no_flank": [false]
  },
  "realign_parameters": {
    "use_annotation": [true]
  },
  "transcript_counting": {
    "min_tr_coverage": [0.4],
    "min_read_coverage": [0.4]
  }
} 

I'm using FLAMES 1.9.2.

Any idea on what is causing the error with paftools?

ChangqingW commented 7 months ago

I have ran into this a couple times before, I had to use the latest script from minimap2's repo (https://raw.githubusercontent.com/lh3/minimap2/master/misc/paftools.js) and use GTF instead of GFF. I am working on including the js script in FLAMES and addressing the minimap2 folder issue you brought up, but for now you might want to either A. ran alignment manually (saving it as align2genome.bam under output folder will make FLAMES skip alignment) or B. make a folder that contains (softlinked) minimap2 binary and the latest js script.

nick-youngblut commented 7 months ago

Thanks @ChangqingW for the info!

and use GTF instead of GFF

Why was the GTF needed instead of the GFF? Is this info in the docs? Maybe it would be good to include such info in the error message for if (id == null) throw Error("No transcript_id"); (actually catch the error and write out a useful message)?

I should note that although the if (id == null) throw Error("No transcript_id"); message occurs rather early in the pipeline, the pipeline continues to run for quite a while before it actually fails. It would help if the pipeline failed fast.

ChangqingW commented 7 months ago

Thanks @ChangqingW for the info!

and use GTF instead of GFF

Why was the GTF needed instead of the GFF? Is this info in the docs? Maybe it would be good to include such info in the error message for if (id == null) throw Error("No transcript_id"); (actually catch the error and write out a useful message)?

I should note that although the if (id == null) throw Error("No transcript_id"); message occurs rather early in the pipeline, the pipeline continues to run for quite a while before it actually fails. It would help if the pipeline failed fast.

This is a known issue in minimap2: https://github.com/lh3/minimap2/issues/422#issuecomment-500848428 I have no plans on modifying the js scripts from minimap2, but yes maybe I can catch the error in R and suggests using GTF.