innate2adaptive / Decombinator

Decombinator v4: fast, error-correcting analysis of TCR repertoires
https://innate2adaptive.github.io/Decombinator/
MIT License
22 stars 8 forks source link

Functionalise Decombinator Scripts #31

Closed MVCowley closed 6 months ago

MVCowley commented 6 months ago

dcr_pipeline.py acts as the main function for the three key scripts used at present in the Chain lab (Decombinator, Collapsinator, and CDR3translator). These scripts have been repackaged into functions that require inputargs, a dictionary created by args() in dcr_utilities, and (in the case of Collapsinator and CDR3translator) a data object, which is a list output by the previous function in the chain (decombinator() -> collapsinator() -> cdr3translator()). The reasons for this change are:

  1. Skip I/O steps between pipeline elements.
  2. Lay the groundwork for adopting a more Pythonic object-orientated approach in future changes.

As this change breaks users' existing scripts a major version increment will occur upon completion of this refactor, and at present the modules contain a message that notifies users of the change if run directly from the terminal.

The pipeline is now called step-by-step by running dcr_pipeline.py from the shell, with any arguments specified (input arguments are shared between pipeline steps) e.g.:

python dcr_pipeline.py -fq some_fastq_file.fq.gz -br R2 -bl 42 -c a -ol M13 -dz

This script will load in a fastq file, process the data through the entire pipeline, and write out (via write_out()) the data into the AIRRseq community .tsv format.