This will be the substitute of TagByWindow (see #424), with the experimental/beta name being ComputePairEndWindowStats. The requirements for the implementation are the following:
[x] New program group for mapped read sources (SAM/BAM/CRAM)
[x] Help formatter class to be shared between all ReadTools CLPs (required to print settings in the same way for GATK's walkers).
[x] StatFunction abstraction for input either a single-end (SingleReadStatFunction) or pair-end (PairEndReadStatFunction) read and compute the statistic. For the pair-end function there should be a system to compute statistic in a running iterator, where the read pairs are added in two steps.
[x] ProperStatWindowCalculator implementation, holding the information for a window and with an add method for a read. This should hold the running-statistics.
[x] ProperStatWindowEngine implementation, where all the windows to be computed would be hold, and it will be the class responsible to add the reads to each window calculator. Should take into account the unmapped reads properly (source of errors in TagByWindow).
Finally, the ComputePairEndWindowStats would be implemented as following:
[x] ReadTools' version of ReadWalker, holding an engine instance to be used in apply. In addition, the tool should override the
[x] Some statistics hardcoded (needed for the project that I am working on to produce the results and perform integration tests)
[x] Default read filters: mapped (source of errors in previous implementations) and primary read (can produce problems with the assumption of only two read-pairs). We should specify that it is not recommended to remove this filters.
[ ] Transformer for qualities.
Improvements after the experimental/beta tool:
[ ] Convert the stats to a plugin, with no default but at least one provided. This will allow to add more stats. Requires that the GATK version is in the latest barclay.
[x] New walker class derived from GATK's ReadWalker, derived from ReadToolsProgram to share configuration and settings. This should have a pre-read transformer by default, which is the one for converting the quality to standard.
This will be the substitute of
TagByWindow
(see #424), with the experimental/beta name beingComputePairEndWindowStats
. The requirements for the implementation are the following:StatFunction
abstraction for input either a single-end (SingleReadStatFunction
) or pair-end (PairEndReadStatFunction
) read and compute the statistic. For the pair-end function there should be a system to compute statistic in a running iterator, where the read pairs are added in two steps.ProperStatWindowCalculator
implementation, holding the information for a window and with anadd
method for a read. This should hold the running-statistics.ProperStatWindowEngine
implementation, where all the windows to be computed would be hold, and it will be the class responsible to add the reads to each window calculator. Should take into account the unmapped reads properly (source of errors inTagByWindow
).Finally, the
ComputePairEndWindowStats
would be implemented as following:ReadWalker
, holding an engine instance to be used inapply
. In addition, the tool should override theImprovements after the experimental/beta tool:
ReadWalker
, derived fromReadToolsProgram
to share configuration and settings. This should have a pre-read transformer by default, which is the one for converting the quality to standard.