Closed nsheff closed 10 months ago
In reviewing old issues, I have another thought... not only the actual reported results, but also the status of the run could fall under this same spec. Or is it a different spec?
The analog is the flag system. Pypiper is setting flags, and looper reads them with 'looper check'.
With a separate formal tool for that, we would outsource the status to that alternative system. it sort of seems to fit since this summary tool is also used to sort of watch the progress (as you summarize and things get reported). We want to know when the job is complete, for example.
This becomes pipestat
. It is a python package with a CLI, which operates like:
pipestat stat_name stat_type value
eg
pipestat Aligned_reads numeric 3000000
Can also be called from python:
import pipestat
psm = pipestat.PipeStatManager(database_connection_or_path)
psm.write("Aligned_reads", "numeric", 3000000)
We document a CLI and python API. Pypiper uses the python API; any shell pipeline could use pipestat in its CLI.
pipestat summarize can create a table summarizing stuff. it's the table function of looper, independent of looper. It just needs a list of the samples. Where does it get that list? Well, it can just take a list of files, or a database connection. Looper can just manage that list of files or database connection.
A few remaining questions:
-c pipestat_sample.yaml
? env var $PIPESTAT
? the python can handle this with the persistent object.A use case of the CLI for pipestat: https://github.com/pepkit/hello_looper/issues/3
Another related issue: Right now, looper reads flags output from pypiper, but expects these to be in a particular location.
The refgenie build process puts them in a subfolder of the canonical outfolder, to separate the build logs from the pipeline results that go into the archive. because of this, looper can't find the flag, and doesn't know which jobs are complete.
So, there needs to be:
This issue had been previously raised as pepkit/pipestat#34
After completing pepkit/looper#238, the summarizer will be simplified to only the built-in summarizer (no more custom summarizers). This is because we now have
runp
for project-level pipelines, which replace the need for the custom summarizers, which were basically project-level pipelines.At this point, we should:
Types and their function could be:
HTML results would fall under 'file' I guess.