michaelgruenstaeudl / PACVr

Plastome Assembly Coverage Visualization in R
Other
3 stars 4 forks source link

Version 1.0.10 #38

Closed alephnull7 closed 3 months ago

alephnull7 commented 4 months ago

The primary change included is the conversion of analysisSpecs, gbkData, and plotSpecs into proper objects defined with the R6 object-oriented system. This allows, among other things, the ability to pass these objects by reference, which turned out to be very useful for analysisSpecs in being able to update the fields of the object dynamically. This implementation is summarized in README.md, which attempts to fulfill what the user inputted for IRCheck, but will "simplify" the analysis done, currently by resetting IRCheck to either NA or 0, typically when encountering data that will make the inputted "advanced" analysis not possible. This includes a reset of IRCheck to NA when the IR alignment checks in checkIREquality() result in a positive mismatch count. In this specific case, the quadripartite partitioning of the genome could be successfully visualized, but the warning message of "Proceeding with coverage depth visualization, but without quadripartite genome structure ..." implied that such visualization might be inaccurate or invalid. Essentially, PACVr.complete() will try to perform whatever analysis can be done that is a subset of what is originally specified.

Where possible, artifacts written by PACVr.compileCovStats() have been combined. Specifically, the file <sampleName>_coverage.summary.regions has been renamed <sampleName>_summary.regions, and correspondingly, contains additional summary statistics for the Complete_genome, corresponding to the item previously written by checkIREquality(). This results in the count of ambiguous nucleotides always being included in <sampleName>_summary.regions, now named N_count, and when synteny testing occurs, the number of IR mismatches is included as IR_mismatches.

The GitHub actions are updated so that they will all be done when changes to CHANGELOG.md are part of a push, with the creation of a package release contingent on the successful completion of things like R CMD check. Other updates to those files are efforts to delay the depreciation of components used in them.

alephnull7 commented 4 months ago

@michaelgruenstaeudl it is my understanding that even though checkIREquality() has the warning "Proceeding with coverage depth visualization, but without quadripartite genome structure ...", the results of the function have had no impact on the resulting visualization - at least since I've been a contributor. My current changes in the pull request include a positive count for IR mismatches resulting in what is indicated in that warning. However, as I'm basing this change solely on a log statement, it is totally possible that this warning reflects a depreciated requirement of the package, and no change in visualization should actually result. On the other hand, if those changes are appropriate, it would probably also make sense for checkIREquality() to always be called following GenerateIRSynteny(), with the writing of these results to file for tabularCovStats = TRUE. In either case, additional changes related to checkIREquality() are required.