jts / ncov-tools

Small collection of tools for performing quality control on coronavirus sequencing data and genomes
MIT License
47 stars 16 forks source link

Plotting scripts may crash if no metadata (Ct values) available #108

Open dfornika opened 7 months ago

dfornika commented 7 months ago

We have an automated system that pulls metadata (collection dates and Ct values) for all samples prior to running artic & ncov-tools. We've recently run into a situation where no samples had metadata available (all NA values for date and ct in the metadata file).

The error we've run into is occuring in the plot_qc_sequencing.R script, where the merged dataframe here::

https://github.com/jts/ncov-tools/blob/7a19778c644594fe17ce4fc703560e97b149aa60/workflow/scripts/plot/plot_qc_sequencing.R#L5

...is empty, resulting in this error message:

Error: Faceting variables must have at least one value
Backtrace:
     █
  1. └─global::plot_depth_by_amplicon_and_ct(df, metadata, args$output)
  2.   └─ggplot2::ggsave(outname, width = 15, height = 10)
  3.     ├─grid::grid.draw(plot)
  4.     └─ggplot2:::grid.draw.ggplot(plot)
  5.       ├─base::print(x)
  6.       └─ggplot2:::print.ggplot(x)
  7.         ├─ggplot2::ggplot_build(x)
  8.         └─ggplot2:::ggplot_build.ggplot(x)
  9.           └─layout$setup(data, plot$data, plot$plot_env)
 10.             └─ggplot2:::f(..., self = self)
 11.               └─self$facet$compute_layout(data, self$facet_params)
 12.                 └─ggplot2:::f(...)
 13.                   ├─ggplot2:::unrowname(...)
 14.                   │ └─base::is.data.frame(x)
 15.                   └─ggplot2::combine_vars(data, params$plot_env, vars, drop = params$drop)
Execution halted

We haven't confirmed, but a similar error may arise in the other plotting functions in the script that use the metadata.

rdeborja commented 7 months ago

@dfornika Any chance you can attach some example files:

dfornika commented 7 months ago

I can generate some on Monday, but I think any normal inputs, along with a metadata.csv file that has all NA values for date and ct should trigger the error.

rdeborja commented 7 months ago

Thanks @dfornika. Unfortunately I no longer have access to prior data as I have moved on from my role when I originally helped on the project.

dfornika commented 7 months ago

Ok no problem, I'll dig something up as soon as I can.

dfornika commented 7 months ago

I had to give them .txt extensions so they'd be accepted here for upload.

I grabbed a couple of SRA samples, downsampled them to ~25x coverage and ran them through our artic pipeline, then through ncov-tools v1.9.1.

I might need to follow up with the merged.qc.csv file. I think that file may be generated downstream of the error we're seeing.

metadata.txt SRR27382851-1-25x.amplicon_depth.txt SRR27382851-1-25x.per_base_coverage.txt SRR27382852-1-25x.amplicon_depth.txt SRR27382852-1-25x.per_base_coverage.txt

dfornika commented 7 months ago

Hmm I don't see a merged.qc.csv in our outputs. It looks like that's being generated from the artic qc.csv file(?) I don't think we provide that as an input.

DarianHole commented 7 months ago

We've encountered this bug before and the way we got past it was just to add in one integer value to the ct column, usually for a control sample. Not the best method but it gets past the error

From our docs:

If there are any blank cells in the metadata table, replace them with an NA value. If the entire ct column consists of NA, this will cause problems for ncov-tools. In this case, either try to pull the correct Ct values from the database or use a placeholder Ct value such as 0 or 100

dfornika commented 7 months ago

That sounds like a reasonable work-around. On our side, we discovered that there was an upstream issue in our process that pulls metadata from our database for use in this pipeline that was causing us to miss most of our Ct and collection date values for some recent runs. We've got that issue fixed now, so we're less likely to run into this in the near future.

But the fix seems fairly simple so it would be nice to have the issue resolved so it won't arise again, if possible.

rdeborja commented 7 months ago

@dfornika @DarianHole I created a branch which automatically addresses the metadata ct issue. When the metadata is imported, if all the CT values are NA, it automatically fills them with 0: https://github.com/jts/ncov-tools/blob/3028cc610830c13e0db1a1bce464e0feeb382fdf/workflow/scripts/plot/plot_qc_sequencing.R#L125

I also addressed issue #110 and added N as a key in the base dictionary: https://github.com/jts/ncov-tools/blob/3028cc610830c13e0db1a1bce464e0feeb382fdf/workflow/scripts/format_pileup.py#L15

I noticed you addressed it on your end, but hopefully this helps. Let me know if it works out and I can merge the code and do a release.