Closed alephnull7 closed 5 months ago
Hi Greg, Your recent edits are excellent! Yes, breaking printCovValsAsTable into smaller, more readable functions is a great, which will make debugging – should it ever become necessary – much easier. Also, your interpretation about the region information (i.e., the quadripartite structure info) is correct: it is only used for referencing, naming, and – especially – grouping the coverage values.
The functions
verboseInformation
andwriteTables
have been renamedprintCovStats
andprintCovValsAsTable
, respectively.Due to previous work I had done, specifically with the creation and use of the source sequence as
quadripRegions
when no region partitioning is done, the extension ofprintCovValsAsTable
to include such cases fit in with the way data was already being passed around. To better understand the processes performed in the function and facilitate better maintenance in the future, I refactored and broke upprintCovValsAsTable
into smaller helper functions, and in the process migrated all of these components ofprintCovStats
into a new fileverboseInformation.R
. As of now, this is the only function called fromPACVr.verboseInformation
besidescheckIREquality
(currently located inIROperations.R
), but due to the purpose and amount of code involved, it seemed necessary. On a similar note, a long overdue migration of code related to transformingread.gb
data into forms used byPACVr
has occurred, and is now housed inread.gb2PACVr.R
. In addition to the above refactoring ofprintCovValsAsTable
, minor changes to the code involved were done, in the service of having the outputted coverage data's column names relate to what "regions" are being analyzed. As of now, this handles the case of quadripartite regions, where the terminology "Chromosome" is used, and when the entire source is used, the term "Source" is applied. When testing these changes toprintCovValsAsTable
and the above two cases, apart from the changes in column names and the names used for the sequences, the only difference I could see was in thelowCoverage
column forir_regions
. This aligns with my understanding of the coverage analysis done, that except for the assignment ofir_regions$lowCoverage
usingcov_regions
, the region data appears to be used for only referencing, naming, and grouping. That is, it seems that is the only place where the respective subset a sequence is a part of is taken into account or could be taken into account. As a result, the threshold used forir_regions$lowCoverage
is different when the subset is the entire source, compared to what the threshold is when the subset is the quadripartite region the sequence is a part of.Edit: Due to factoring the creation of the verbose files directory into a separate function
getVerbosePath
, a distinction betweenprintCovStats
andprintCovValsAsTable
no longer made sense, so the helper functionprintCovValsAsTable
no longer exists.