iand675 / datasketches-haskell

4 stars 2 forks source link

invariant violated #2

Open robinp opened 1 year ago

robinp commented 1 year ago

Hello - I use prometheus-client, which uses data-sketches for its Summary metric.

I periodically run into invariant violated: lastWeight does not equal raSize. Actually running a patched data-sketches to get more details, and the output samples are:

Original code: https://github.com/iand675/datasketches-haskell/blob/f327321a33f6084e61dc2e4a1dec3211683bc747/data-sketches-core/src/DataSketches/Quantiles/RelativeErrorQuantile/Internal/Auxiliary.hs#L106

Modified:

createCumulativeWeights :: PrimMonad m => MReqAuxiliary (PrimState m) -> m ()
createCumulativeWeights this = do                    
  weights <- getWeights this                    
  let size = MUVector.length weights                                                                                           
  let accumulateM i weight = do
        when (i > 0) $ do                                  
          prevWeight <- MUVector.read weights (i - 1)
          MUVector.unsafeWrite weights i (weight + prevWeight)
  forI_ weights (\i -> MUVector.read weights i >>= \x -> accumulateM i x)
  lastWeight <- MUVector.read weights (size - 1)
  when (lastWeight /= mraSize this) $ do                   
    error ("invariant violated: lastWeight does not equal raSize: " <> show lastWeight <> ", " <> show (mraSize this) <> " (size: " <> show size <> ")")  -- modified to include more detail

This is how prometheus-client uses the sketch: https://github.com/fimad/prometheus-haskell/blob/631f8f44ee710673f7ba88d2cda346f3592b0f7b/prometheus-client/src/Prometheus/Metric/Summary.hs#L71 (and see observe on update, collectSummary on getting the quantiles). Though I have a slightly patched prom as well, I don't think I modified this logic.

The quantiles used are typically the defaultQuantiles at https://github.com/fimad/prometheus-haskell/blob/631f8f44ee710673f7ba88d2cda346f3592b0f7b/prometheus-client/src/Prometheus/Metric/Summary.hs#L112 (though I added (1.0, 0) to the mix, I think I remember this occuring even before).

Do you have any insight on how to based on this? Happens relatively rarely, ~1x per day. How to best test this I wonder, or what state could I dump if it helped? Thank you.

robinp commented 10 months ago

For reference, https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/req/ReqSketchSortedView.java seems to be the Java counterpart. Nothing very odd sticks out directly.

Very likely first step needs to be adding a property-based test exercising the insert + getQuantile functionality.