Closed kavj closed 3 years ago
Merging #39 into master will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #39 +/- ##
=======================================
Coverage 91.69% 91.69%
=======================================
Files 29 29
Lines 2155 2155
=======================================
Hits 1976 1976
Misses 179 179
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update cf9c4f5...9067d47. Read the comment docs.
I might suggest squashing commits and rebasing wrt master.
Which I can help with if needed.
I might suggest squashing commits and rebasing wrt master.
I think I messed up the rebase last time. I would have preferred to include on an experimental branch or code section, but it's not clear that there's a section for that, which would ensure that tests are still run. It's meant to remove a source of observed cancellation issues.
A followup to this will add detection of bad data regions. This allows the problem to be sub-tiled across normal regions, where all data windows in that region admit a normalized representation.
I may also suggest an update to mean and inverse norm to further avoid propagating any rounding across windows. It's quite difficult to obtain optimal reliability with look-ahead methods.
@kavj Have you taken the time to implement the "ab-join" and parallel logic? Where does this code stand overall?
@kavj Have you taken the time to implement the "ab-join" and parallel logic? Where does this code stand overall?
This was on the master branch. I'm not sure that's the place for it.
Ideally this would be factored out to work with this section and the streaming section.
This removes the use of the twisted factorization a b - c d = (1/2) ((a + b) (c - d) + (a - b) * (c + d)) from the difference formulas. While that one requires slightly less memory access in the case of a self join, it seems to fail in cases containing missing data. Further, the reduction step can sometimes make it difficult to tell when it first diverges to a meaningful degree.
I will probably suggest a strategy for restarting calculations bordering missing data regions at a later time, but in addition to that, I haven't observed complete failure in this case. I suspect the other sometimes added the product of an underflowing value and a large value, which is problematic here.