Rmarkdown report fails: Detected 7 column names but the data has 5 columns

hoelzer commented 4 months ago

Hey,

I am using the pipeline on SARS-CoV-2 Capture-Seq data (Illumina paired-end).

nextflow run rki-mf1/covPipe2 -r v0.5.1 --fastq fq_list.csv --list --kraken --update -profile slurm,singularity --output results-covpipe2-martin

Everything runs fine, but then

[a1/556128] process > summary_report:rmarkdown_report (1)                     [100%] 1 of 1, failed: 1 ✘

Here is the full error print:

Error executing process > 'summary_report:rmarkdown_report (1)'

Caused by:
  Process `summary_report:rmarkdown_report (1)` terminated with an error exit status (1)

Command executed:

  cp -L summary_report.Rmd report.Rmd
  Rscript -e "rmarkdown::render('report.Rmd', params=list(mode='paired', fastp_table_stats='read_stats.csv', fastp_table_stats_filter='read_stats_filter.csv', kraken_table='species_filtering.csv', flagstat_table='mapping_stats.csv', fragment_size_table='fragment_sizes.csv', fragment_size_median_table='fragment_sizes_median.csv', coverage_table='coverage_table.csv', positive='positive_samples.csv', negative='negative_samples.csv', sample_cov='coverage_samples.csv', president_results='president_results.tsv', pangolin_results='pangolin_results.csv', nextclade_results='nextclade_results.tsv', nextclade_version='nextclade 3.3.1',  nextclade_dataset_info='sars-cov-2, 2024-04-15--15-08-22Z', sc2rf_results='sc2rf_results.csv', vois_results='none', cns_min_cov='20', run_id='none', pipeline_version='https://github.com/rki-mf1/covPipe2 - v0.5.1 [453eb8fca67179cfc6a21bfdb23aab8248e758de]'), output_file='report.html')"

Command exit status:
  1

Command output:

    |
    |                                                                      |   0%
    |
    |.                                                                     |   2%
    ordinary text without R code

    |
    |...                                                                   |   4%
  label: setup (with options)
  List of 1
   $ include: logi FALSE

    |
    |....                                                                  |   6%
    ordinary text without R code

    |
    |......                                                                |   8%
  label: get_cmd_line_parameters

Command error:

  processing file: report.Rmd
  Quitting from lines 86-108 (report.Rmd)
  Error in setnames(x, value) :
    Can't assign 5 names to a 7 column data.table
  Calls: <Anonymous> ... eval -> eval -> names<- -> names<-.data.table -> setnames
  In addition: Warning message:
  In FUN(X[[i]], ...) :
    Detected 7 column names but the data has 5 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
  Execution halted

Is this maybe a problem of changes to pangolin/nextclade output?

Thanks!

(ps would be great if I can get this run until next week Monday, to use the data for a report)

anfarr commented 4 months ago

Hey, I had the same error. See the issue I have opened for reference (#71). As far as I have investigated, it is not pangolin/nextclade but the output of sc2rf. When separating the output by comma, seven columns are created. However, in the Rmd script five column names are assigned. https://github.com/rki-mf1/CoVpipe2/blob/453eb8fca67179cfc6a21bfdb23aab8248e758de/bin/summary_report.Rmd#L104

Maybe you can do a temporary fix by swapping L104 with colnames(dt.sc2rf_results)[1:5] <- c('sample','examples','intermissions','breakpoints','regions') in your local installation of the pipeline.

I am not sure where the regression occurred. I compared output files of sc2rf of the last months and there was no change in format. At first glance i cant find a commit that might be the reason. Unfortunately i do not have the time right now to investigate further.

Anyway, the result files should still be present in your specified publishing directory under ./Report/single_tables .

Best regards Anton

hoelzer commented 4 months ago

Thanks @anfarr !

At least via that the pipeline ran through and produced the final HTML report.

@MarieLataretu I can also submit a PR with that change... but not sure if it's the best way of fixing that. Please feel free to do smt else and reject my PR ;)

hoelzer commented 4 months ago

See https://github.com/rki-mf1/CoVpipe2/pull/73

MarieLataretu commented 4 months ago

Hi all, thanks for reporting, @hoelzer , @anfarr !

Could you please test the branch MarieLataretu/issue72?

nextflow pull rki-mf1/CoVpipe2
nextflow run rki-mf1/CoVpipe2 -r MarieLataretu/issue72 ...

hoelzer commented 4 months ago

Hey @MarieLataretu thanks! I tested the issue72 branch and it worked!

MarieLataretu commented 4 months ago

Nice, I'll prepare the release then!

rki-mf1 / CoVpipe2

Rmarkdown report fails: Detected 7 column names but the data has 5 columns #72