roblanf / minion_qc

Quality control for MinION sequencing data
MIT License
210 stars 42 forks source link

Compatibility with dorado summary #65

Open fanavarro opened 4 months ago

fanavarro commented 4 months ago

Hi there, I am having problems when trying to run minion_qc 1.4.2 with the sequencing summary generated by dorado (with the dorado summary command).

Normally, when the basecall is performed at sequencing time in the device, it returns a sequencing summary that works fine with minion_qc, having the following columns:

filename_fastq  filename_fast5  filename_pod5   parent_read_id  read_id run_id  channel mux     minknow_events  start_time      duration        passes_filtering        template_start  num_events_template     template_duration       sequence_length_template        mean_qscore_template    strand_score_template   median_template mad_template    pore_type       experiment_id   sample_id       end_reason

Nonetheless, when I perform the basecall after the sequencing in my own computer, the sequencing summary file I get from the dorado summary command contains a lower number of columns:

filename        read_id run_id  channel mux     start_time      duration        template_start  template_duration       sequence_length_template        mean_qscore_template    barcode`

A dorado user already asked about these differences here, but the dorado team said that they do not plan to implement the summary in a way that matches the sequencer output.

My question is if it would be possible to manipulate the sequencing summary returned by dorado summary to input it to minion_qc. I can try to create an script to do it, but I would need to know what are the columns required by minion_qc, as I do not know if all the information required by minion_qc can be extracted from that summary file.

Thanks beforehand, Fran

mpnelsen commented 1 month ago

I'm having the same problem (and get the error message #66 received) when using a dorado-derived summary file (attached). It looks like maybe there were plans to eliminate the num_events_template in issue #51 ? Thanks much for any help on how to use this w dorado summary files. -Matt

(base) xxxxx-MacBook-Pro-3:~ xxxxx$ Rscript MinIONQC.R -i /Library/MinKNOW/data/duplex_splitduplex_summary.txt -o /Library/MinKNOW/data/QC-Reports-simplex INFO [2024-08-12 14:36:03] Loading input file: /Library/MinKNOW/data/duplex_splitduplex_summary.txt INFO [2024-08-12 14:36:03] MinION flowcell detected Error in $<-: ! Assigned data as.numeric(as.character(d$num_events_template)) must be compatible with existing data. ✖ Existing data has 2523 rows. ✖ Assigned data has 0 rows. ℹ Only vectors of size 1 are recycled. Caused by error in vectbl_recycle_rhs_rows(): ! Can't recycle input of size 0 to size 2523. Backtrace: ▆

  1. ├─global single.flowcell(input.file, output.dir, q)
  2. │ └─global load_summary(input.file, min.q = c(-Inf, q))
  3. │ ├─base::$<-(*tmp*, "num_events_template", value = <dbl>)
  4. │ └─tibble:::$<-.tbl_df(*tmp*, "num_events_template", value = <dbl>)
  5. │ └─tibble:::tbl_subassign(...)
  6. │ └─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)
  7. │ ├─base::withCallingHandlers(...)
  8. │ └─vctrs::vec_recycle(value[[j]], nrow)
  9. └─vctrs:::stop_recycle_incompatible_size(...)
    1. └─vctrs:::stop_vctrs(...)
    2. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) Execution halted

duplex_splitduplex_summary.txt

mpnelsen commented 1 month ago

I'm realizing that it works if I add in a num_events_template column filled with zeroes...