sxs-collaboration / sxs

Python code for manipulating data from the SXS collaboration
MIT License
25 stars 18 forks source link

LVC conversion: Are error outputs needed? #9

Closed moble closed 4 years ago

moble commented 4 years ago

The errors produced by romspline and stored in the output LVC format files do not seem useful for our purposes; they look like they're more relevant to studying the greedy algorithm itself. Storing them increases the size of each LVC-format file by almost 50%.

Each time series in LVC format is currently stored as a group containing five datasets:

  1. X, representing the sample times (aka knots of the spline)
  2. Y, representing the data values at those times
  3. deg, giving the degree of the spline used in the greedy algorithm
  4. tol, giving the target tolerance allowed by the greedy algorithm
  5. errors

This last dataset represents the largest error that the greedy algorithm encountered at each step as it was progressively choosing more and more knots to use for the spline. For example, the first element in any given errors dataset is the largest error when approximating the entire data series in question by a spline with just 6 knots; the second element gives the largest error with 7 knots. I don't see any connection between these quantities and anything we actually care about with the final result, which will typically use hundreds to thousands of knots. In particular, the points with those largest errors are then added to the set of knots, so that the error for the final result at those points should actually be 0.0 in all cases.

The romspline.ReducedOrderSpline.write function that we use to write these files has a slim option that, if set to True skips writing this errors dataset. Can we do so? Does anyone use those datasets? This would reduce our LVC file sizes by almost a third, and might simplify certain other things that I think might need to happen.

moble commented 4 years ago

On a vacuum phone call, we agreed that this field is no longer needed. I think as a matter of safety, I might actually keep this around, in the form of an array of size (1,) and just containing a copy of the tolerance. On second thought, I think it will be safer to not include anything. If anyone actually uses this, they were probably using it wrong anyway, and we can just tell them to use the tol field instead.

moble commented 4 years ago

Closed by #10