umccr / cwl-ica

A collection of cwl-ica workflows along with a user guide for the commands to use and contributions guide
MIT License
8 stars 2 forks source link

interop: imaging_table #492

Closed pdiakumis closed 4 months ago

pdiakumis commented 6 months ago

Hello, I was wondering if it's possible to add imaging_table to the interop commands at https://github.com/umccr/cwl-ica/blob/main/tools/illumina-interop/1.3.1/illumina-interop__1.3.1.cwl#L50 ? See https://illumina.github.io/interop/imaging_table.html.

interop_imaging_table /path/to/run > results_2.csv

That output file is around 400 MB unzipped, down to 45 MB if zipped (based on N=1 test). Around a million+ rows, since you've got Lane Tile Cycle. Runtime is around 5 minutes.

It includes various metrics which are used to generate plots like those found in the SAV imaging tab which is used by the wet lab team. Example:

d |> glimpse()
Rows: 1,190,592
Columns: 49
$ Lane                         <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
$ Tile                         <chr> "1101", "1101", "1101", "1101", "1101", "1101", "1101", "1101", "1101", "1101", "1101", …
$ Cycle                        <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "…
$ Read                         <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
$ `Cycle Within Read`          <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "…
$ `Density(k/mm2)`             <dbl> 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, 2961.3, …
$ `Density Pf(k/mm2)`          <dbl> 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, 2534.7, …
$ `Cluster Count (k)`          <dbl> 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, 4091.9, …
$ `Cluster Count Pf (k)`       <dbl> 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, 3502.5, …
$ `% Pass Filter`              <dbl> 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6, 85.6…
$ `% Aligned`                  <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ `Legacy Phasing Rate`        <dbl> 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.118, 0.11…
$ `Legacy Prephasing Rate`     <dbl> 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.042, 0.04…
$ `Error Rate`                 <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ `%>= Q20`                    <dbl> 97.53, 98.52, 98.67, 98.93, 98.97, 98.93, 98.90, 98.97, 98.93, 99.00, 98.89, 99.09, 98.8…
$ `%>= Q30`                    <dbl> 93.30, 95.89, 95.96, 95.99, 96.29, 96.38, 95.98, 95.82, 96.08, 96.33, 95.88, 95.99, 95.8…
$ P90_RED                      <dbl> 1033, 1035, 1003, 988, 980, 976, 972, 966, 962, 958, 956, 957, 943, 949, 943, 942, 937, …
$ P90_GREEN                    <dbl> 670, 678, 697, 700, 695, 696, 688, 686, 688, 689, 679, 680, 668, 672, 671, 678, 669, 662…
$ `% No Calls`                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ `% Base_A`                   <dbl> 31.8, 32.3, 31.1, 30.4, 29.3, 30.0, 29.2, 29.7, 29.3, 30.7, 29.6, 29.5, 30.2, 30.5, 30.3…
$ `% Base_C`                   <dbl> 20.5, 18.2, 19.1, 20.5, 19.4, 19.5, 19.0, 19.6, 20.1, 18.7, 19.6, 19.9, 19.9, 19.4, 19.9…
$ `% Base_G`                   <dbl> 16.5, 19.6, 20.0, 19.3, 19.7, 20.2, 21.3, 20.3, 19.8, 19.7, 19.7, 20.1, 20.0, 19.6, 20.0…
$ `% Base_T`                   <dbl> 31.2, 29.9, 29.9, 29.8, 31.6, 30.3, 30.4, 30.3, 30.8, 30.9, 31.1, 30.5, 29.9, 30.5, 29.8…
$ Fwhm_RED                     <dbl> 1.75, 1.75, 1.76, 1.76, 1.76, 1.74, 1.74, 1.77, 1.74, 1.74, 1.76, 1.76, 1.75, 1.73, 1.73…
$ Fwhm_GREEN                   <dbl> 1.57, 1.61, 1.57, 1.58, 1.58, 1.55, 1.58, 1.57, 1.60, 1.59, 1.59, 1.58, 1.62, 1.61, 1.59…
$ Corrected_A                  <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Corrected_C                  <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Corrected_G                  <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Corrected_T                  <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Called_A                     <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Called_C                     <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Called_G                     <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ Called_T                     <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ `Signal To Noise`            <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN…
$ `Phasing Weight`             <dbl> 1.25, 2.00, 1.00, 1.25, 0.75, 2.00, 1.00, 1.75, 1.25, 1.50, 2.00, 2.75, 1.50, 3.25, 3.00…
$ `Prephasing Weight`          <dbl> 2.25, 0.25, 0.75, 0.25, 0.25, 0.25, 0.25, 1.00, 0.50, 1.00, 1.00, 0.75, 1.50, 2.25, 1.00…
$ `Phasing Slope`              <dbl> 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.099, 0.09…
$ `Phasing Offset`             <dbl> 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.098, 1.09…
$ `Prephasing Slope`           <dbl> 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.064, 0.06…
$ `Prephasing Offset`          <dbl> 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.333, 0.33…
$ `Minimum Contrast_RED`       <dbl> 230, 220, 212, 208, 203, 205, 203, 203, 202, 202, 201, 202, 203, 202, 202, 202, 201, 202…
$ `Minimum Contrast_GREEN`     <dbl> 212, 212, 210, 207, 208, 206, 208, 206, 208, 207, 207, 207, 209, 209, 209, 208, 209, 209…
$ `Maximum Contrast_RED`       <dbl> 502, 484, 460, 456, 448, 446, 442, 446, 439, 441, 439, 437, 437, 435, 438, 433, 433, 434…
$ `Maximum Contrast_GREEN`     <dbl> 415, 416, 411, 409, 405, 404, 404, 401, 404, 404, 403, 402, 403, 404, 406, 403, 403, 400…
$ Surface                      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Swath                        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ `Tile Number`                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ `Cluster Count Occupied (k)` <dbl> 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, 3962.4, …
$ `% Occupied`                 <dbl> 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8, 96.8…

For instance, we can get this plot of % Occupied vs. % Pass Filter:

d |>
  dplyr::distinct(Lane, `% Occupied`, `% Pass Filter`) |>
  ggplot2::ggplot(aes(x = `% Occupied`, y = `% Pass Filter`)) +
  ggplot2::geom_point(aes(colour = Lane), alpha = 0.6) +
  ggplot2::xlim(0, 100) +
  ggplot2::ylim(0, 100) +
  ggplot2::theme_bw()

interop

cc. @mhlunimelb