dcr_pipeline.py acts as the main function for the three key scripts used at present in the Chain lab (Decombinator, Collapsinator, and CDR3translator).
These scripts have been repackaged into functions that require inputargs, a dictionary created by args() in dcr_utilities, and (in the case of Collapsinator and CDR3translator) a data object, which is a list output by the previous function in the chain (decombinator() -> collapsinator() -> cdr3translator()).
The reasons for this change are:
Skip I/O steps between pipeline elements.
Lay the groundwork for adopting a more Pythonic object-orientated approach in future changes.
As this change breaks users' existing scripts a major version increment will occur upon completion of this refactor, and at present the modules contain a message that notifies users of the change if run directly from the terminal.
The pipeline is now called step-by-step by running dcr_pipeline.py from the shell, with any arguments specified (input arguments are shared between pipeline steps) e.g.:
This script will load in a fastq file, process the data through the entire pipeline, and write out (via write_out()) the data into the AIRRseq community .tsv format.
dcr_pipeline.py
acts as the main function for the three key scripts used at present in the Chain lab (Decombinator, Collapsinator, and CDR3translator). These scripts have been repackaged into functions that requireinputargs
, a dictionary created byargs()
indcr_utilities
, and (in the case of Collapsinator and CDR3translator) adata
object, which is a list output by the previous function in the chain (decombinator()
->collapsinator()
->cdr3translator()
). The reasons for this change are:As this change breaks users' existing scripts a major version increment will occur upon completion of this refactor, and at present the modules contain a message that notifies users of the change if run directly from the terminal.
The pipeline is now called step-by-step by running
dcr_pipeline.py
from the shell, with any arguments specified (input arguments are shared between pipeline steps) e.g.:This script will load in a fastq file, process the data through the entire pipeline, and write out (via
write_out()
) the data into the AIRRseq community.tsv
format.