nextflow-io / nf-hack17

Nextflow hackathon 2017 projects
10 stars 2 forks source link

Project 4: HTML tracing report #4

Open ewels opened 7 years ago

ewels commented 7 years ago

HTML tracing report

The Nextflow tracing reports contain loads of information, but it can be a bit of a pain to dig through them. It would be nice to be able to generate HTML output (like the timeline reports) with tables that can be sorted and plots visualising the numbers.

Data:

Any example pipeline + data should be suitable.

Computing resources:

Everything can be run on local computers.

Project Lead:

Phil Ewels (@ewels).

pditommaso commented 6 years ago

Please use this fork html-trace for a future pull request.

lucacozzuto commented 6 years ago

I join

ewels commented 6 years ago

Ideas from our initial discussion:

Project plan:

  1. Look into duplicating trace report with JSON output
  2. Start mocking up a HTML report that uses JSON data

Phil

pditommaso commented 6 years ago

I thing here would be very a fully-customisable report based on an external template file. Something similar is already implemented for the nextflow log command. What do you think ?

lucacozzuto commented 6 years ago

The idea here is to have everything formatted in json to then be fed to a system than embed it in a email/website based on a template file. So the output should contain everything

mfoll commented 6 years ago

Information on the log command: https://groups.google.com/forum/#!topic/nextflow/YdSANW4GGQA

And:

$ nextflow log -h
Print executions log and runtime info
Usage: log [options] 
  Options:
    -after
       Show log entries for runs executed after the specified one
    -before
       Show log entries for runs executed before the specified one
    -but
       Show log entries of all runs except the specified one
    -f, -fields
       Comma separated list of fields to include in the printed log -- Use the
       `-l` option to show the list of available fields
    -F, -filter
       Filter log entries by a custom expression e.g. process =~ /foo.*/ &&
       status == 'COMPLETED'
    -h, -help
       Print the command usage
       Default: false
    -l, -list-fields
       Show all available fields
       Default: false
    -q, -quiet
       Show only run names
       Default: false
    -s
       Character used to separate column values
       Default:     
    -t, -template
       Text template used to each record in the log
pditommaso commented 6 years ago

I agree, but what I'm suggesting is to use configurable that allows you to produce this report as json or html, or whatever format provided by the user as a template file

lucacozzuto commented 6 years ago

Another nice feature would be to have version information of used tools in the report. Paolo suggested to keep those info in special environmental variables (also in the containers).

ewels commented 6 years ago

Quick snippet to dump a NF trace file to JSON format, so that I can start playing around with building HTML front-ends: https://gist.github.com/ewels/3233339ae9726695f691c2273bdcfd2f

@pditommaso - I agree that a templating system would be nice :) So I'll see if I can start writing a HTML template using the above JSON data.

lucacozzuto commented 6 years ago
nextflow log -l
  attempt
  complete
  container
  duration
  env
  exit
  hash
  log
  module
  name
  native_id
  pcpu
  peak_rss
  peak_vmem
  pmem
  process
  queue
  rchar
  read_bytes
  realtime
  rss
  scratch
  script
  start
  status
  stderr
  stdout
  submit
  syscr
  syscw
  tag
  task_id
  vmem
  wchar
  workdir
  write_bytes
mfoll commented 6 years ago

https://www.nextflow.io/docs/latest/tracing.html#execution-report

edgano commented 6 years ago

We can use the -t to indicate the template for the log file

nextflow log goofy_kilby -t my-template.md > execution-report.md

ewels commented 6 years ago

So it looks like the template is repeated once per process. @pditommaso - is there a way to have template stuff around both sides of this? So that the loop happens inside the template instead? (eg. to have a HTML page with a table where each row is a process, instead of one HTML file per process).

mfoll commented 6 years ago

Be careful, the example on the google group here doesn't work as $folder is not a valid variable.

ewels commented 6 years ago

Example static JSON file to play with whilst we work out how to handle workflow-level logging:

NGI-RNAseq_trace.json.zip (updated to remove \n)

mfoll commented 6 years ago

@pditommaso in the trace file, the column name should be split into two fields name and tag like:

task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    rss vmem    rchar   wchar
1   f7/3df5af   27990   sayHello (2)    COMPLETED   0   2017-09-14 18:11:19.246 161ms   143ms   -   -   -   -   -
4   0a/1e39f9   27991   sayHello (1)    COMPLETED   0   2017-09-14 18:11:19.251 174ms   159ms   -   -   -   -   -

should be:

task_id hash    native_id   name    tag status  exit    submit  duration    realtime    %cpu    rss vmem    rchar   wchar
1   f7/3df5af   27990   sayHello    2   COMPLETED   0   2017-09-14 18:11:19.246 161ms   143ms   -   -   -   -   -
4   0a/1e39f9   27991   sayHello    1   COMPLETED   0   2017-09-14 18:11:19.251 174ms   159ms   -   -   -   -   -
ewels commented 6 years ago

(or even process_name to be completely clear...)

ewels commented 6 years ago

Quick idea for for a per-task JSON template that would work when concatenated:

data["trace"][$hash] = {
    "status": "CACHED", 
    "hash": "54/244723", 
    "name": "trim_galore (SRR4238359_subsamp)", 
    "task_id": 8.0, 
    "realtime": "20.5s", 
    "%cpu": "117.6%", 
    "submit": "2017-07-05 16:36:17.003", 
    "vmem": "2.5 GB", 
    "native_id": 15645.0, 
    "exit": 0.0, 
    "duration": "22.5s", 
    "wchar": 0.0, 
    "rchar": 0.0, 
    "rss": "147.4 MB"
};
ewels commented 6 years ago

NB: The above gives output that will be easy to parse using Javascript, but it won't be valid JSON. So we still need to talk to @pditommaso about having a template that can wrap around the tasks loop.

ewels commented 6 years ago

Day 2 - we have some new tasks:

  1. Continue building static HTML report
  2. Add additional fields to trace / nextflow log output
    • For example, hostname, requested_cpu, requested_memory, requested_time
  3. Move nextflow log templating functionality into runtime trace function
  4. Make template system work per-run instead of per-task
    • Access to workflow-level variables, such as workflow introspection (command used to launch pipeline etc).
  5. Look into ability to add custom run-level data
    • For example, use a function to add a key:value object that can end up in the nextflow output
    • Use case: custom code to find software versions, report this in main NF report.
  6. Make DAG available to HTML report output
  7. Add a new function specific to finding software version numbers
    • Call with:
      • Nice name for tool (eg. FastQC)
      • Command to get version (eg. fastqc --version)
      • Regex to parse the version number (eg. /FastQC v([\d\.]+)/)
    • Can be called multiple times for multiple tools
    • Saves parsed version numbers to a standardised location, available to other parts of NF
    • Option to tie in so that version numbers can be obtained in other ways (eg. environment variables)
lucacozzuto commented 6 years ago

In particular having the command line / parameters value somewhere (way to keep analysis metadata) will be great.

Question: how to keep the tool versions in the log?

Hammarn commented 6 years ago

Link to the TraceFileObserver

ewels commented 6 years ago

Plan for the report: copy the TraceFileObserver functionality and:

ewels commented 6 years ago

Ok, I opened a WIP pull-request to the html-trace nextflow branch: https://github.com/nextflow-io/nextflow/pull/456

Basic version of -with-report seems to be working now! 🎉

ewels commented 6 years ago

Idea: Instead of doing basic string switches in groovy to insert the JSON / other data, we could use the Groovy templating system. For example, see how I've used this in the e-mail stuff I've done here:

def hfile = new File("template.html")
def html_template = engine.createTemplate(hfile).make(report_fields)
def html_output = html_template.toString()

This moves more of the logic into the template, rather than core nextflow code. This in turn would make the template file much more flexible for others to customise.