The first main change in the package consists of continued refactoring, specifically in treating related data as a composite object when possible. As a result, all GenBank data used within PACVr is now contained in a single object, both versions of the coverage data (GenomicAlignments::coverage() and the derivative from CovCalc()) are fields of the coverage object, and input parameters related to plotting are contained in plotSpecs. I am applying some object-oriented principles, such as in the validation of PACVr.complete() parameters in both getAnalysisSpecs() and getPlotSpecs(). However, the use cases of these objects are fairly simple, partially due to only one instance of each existing during the execution of PACVr.complete(), so I have not yet defined them in one of R's object-oriented systems like S3.
When running some test data, specifically an annotated GenBank file for NC_009143, I noticed the possibility and lack of handling for a feature with multiple qualifiers of the same name. Originally, I considered combining this information into a list. Due to some of the filtering operations done in the package involving regex matching, they are concatenated into a character string instead.
Experiences with other test data have led to note qualifiers being unnecessary for the execution of standard coverage analysis, and allowing the sample name in BAM file to match either VERSION or ACCESSION of GenBank file for verbose analysis.
When attempting to perform the IR presence test, if no feature matches are found, instead of leading to an error, this failure is indicated, and execution continues using the unpartitioned source as a single region to analyze.
Greater validation of the output parameter is performed, and PNG file support has been added.
Within getCovDepth(), lowCoverage in the summary table has been renamed lowCovWin_abs, and a statistic corresponding to lowCovWin_abs/regionLen named lowCovWin_relToRegionLen has been added. An additional row for _coverage.summary.regions.tsv has been added to the verbose output, Complete_genome, with coverage stats corresponding to the sums for regionLen and lowCovWin_abs. Note that the latter item is just a sum of the low coverage counts when each region is considered separately, not the low coverage count when considering the entire source. This aligns with the indicated requirements, but I wanted to mention this since the two sums typically differ. Since evenness is based upon the raw coverage data for sequences, this depth statistic for Complete_genome does use the coverage mean of the entire source in the calculation.
GenomicAlignments::coverage()
and the derivative fromCovCalc()
) are fields of thecoverage
object, and input parameters related to plotting are contained inplotSpecs
. I am applying some object-oriented principles, such as in the validation ofPACVr.complete()
parameters in bothgetAnalysisSpecs()
andgetPlotSpecs()
. However, the use cases of these objects are fairly simple, partially due to only one instance of each existing during the execution ofPACVr.complete()
, so I have not yet defined them in one of R's object-oriented systems like S3.note
qualifiers being unnecessary for the execution of standard coverage analysis, and allowing the sample name in BAM file to match eitherVERSION
orACCESSION
of GenBank file forverbose
analysis.output
parameter is performed, and PNG file support has been added.getCovDepth()
,lowCoverage
in the summary table has been renamedlowCovWin_abs
, and a statistic corresponding tolowCovWin_abs
/regionLen
namedlowCovWin_relToRegionLen
has been added. An additional row forComplete_genome
, with coverage stats corresponding to the sums forregionLen
andlowCovWin_abs
. Note that the latter item is just a sum of the low coverage counts when each region is considered separately, not the low coverage count when considering the entire source. This aligns with the indicated requirements, but I wanted to mention this since the two sums typically differ. Sinceevenness
is based upon the raw coverage data for sequences, this depth statistic forComplete_genome
does use the coverage mean of the entire source in the calculation.