ntamas / plfit

Fitting power-law distributions to empirical data, according to the method of Clauset, Shalizi and Newman
GNU General Public License v2.0
47 stars 17 forks source link

Potential out-of-bounds access in Zeta calculation #37

Closed szhorvat closed 2 years ago

szhorvat commented 2 years ago

There seems to be an out-of-bounds access issue here:

https://github.com/ntamas/plfit/blob/master/src/hzeta.c#L228

I'll copy the relevant code here:

            for(j=1;j<=HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER;++j) {
                delta=hsl_sf_hzeta_eulermaclaurin_series_coeffs[j]*ratio;
                ans+=(terms[n++]=delta);
                scp*=++tscp;
                scp*=++tscp;
                pcp*=sqr_inv_qshift;
                ratio=scp*pcp;
                if ((fabs(delta/ans)) < (0.5*GSL_DBL_EPSILON)) break;
                }

            ans=0.0; while (n) ans+=terms[--n];
            mjr=hsl_sf_hzeta_eulermaclaurin_series_majorantratios[j]*ratio;

When the for loop runs to the very end (without break ever being triggered), the value of j will be HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER + 1. This causes an out-of-bounds access on the last quoted line above.

This came up in the error I quote at the end of this comment:

https://github.com/igraph/igraph/issues/1864#issuecomment-983847407

Pinging @jgmbenoit since it seems this code was contributed by you.

ntamas commented 2 years ago

Yes, it was contributed by @jgmbenoit and it is a complete black box to me so I'm waiting for feedback.

jgmbenoit commented 2 years ago

I will have a look this week-end. If I remember well HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER is meant to be large enough to not be reached.

szhorvat commented 2 years ago

If I remember well HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER is meant to be large enough to not be reached.

It seems it can be reached under some conditions, although I guess under these conditions it may not be possible to get an accurate result anyway?

It would be good to protect against this, as reading over the end of the array can have nasty consequences.

jgmbenoit commented 2 years ago

I agree. Do you have data samples that actually reach the conditions ? I will have a look tomorrow.

szhorvat commented 2 years ago

I'd need more time to figure out what value was being passed in before the fix, but I'll note that using s = NaN or s = infinity in hsl_sf_hzeta(s, 1) triggers the out-of-bounds access.

Of course these are bad values, but it'd be good not to crash if they somehow happen to arise.

jgmbenoit commented 2 years ago

The hsl_sf_*zeta* functions follow the policy used in the GSL library, namely they follow their gsl_*_zeta* counterpart as concerns the input values. I guess it is a good idea to stick to GSL policy. This means that these bad values might be treated upstream in the calling code. Note that the initial *zeta* functions were imported from GSL. Amazingly the GSL code does not bother to check the EULERMACLAURIN_SERIES_ORDER: it is assumed not to be reached. Real/practical values would be nice.

szhorvat commented 2 years ago

Is there any good reason NOT to guard against running over the end of the array? There is already error reporting functionality in place and it's a trivial change to make.

No, I don't have an example input other than Inf/NaN that triggers this, but it's not clear that it doesn't exist either (I don't think I have time to do an exhaustive search). Also, it's conceivable that the caller somehow ends up inadvertently generating an Inf/NaN and the consequence should not be a blind crash. It should at least be an informative error.

This fix would also shut up the static analysers which always go crazy on this when I used them on igraph.

jgmbenoit commented 2 years ago

Actually there is no reason at all. The change can be made safely. I guess I was confused with the rest of the code which is far less trivial. Otherwise, you may check that the caller gives a non inf/nan value. I would consider the numerical functions safe.

jgmbenoit commented 2 years ago

I have just read the code. Actually there are no possible out-of-bounds ad the array hsl_sf_hzeta_eulermaclaurin_series_coeffs and hsl_sf_hzeta_eulermaclaurin_series_coeffs have HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER+1 elements. It is not a SIZE but and a series ORDER.

szhorvat commented 2 years ago

I showed how the out-of-bounds happens in the top post. I opened this issue because I saw an actual crash, and the out-of-bounds is also detected both by Address Sanitizer, as well as Clang Static Analyzer.

When the for loop runs to the very end (without break ever being triggered), the value of j will be HSL_SF_HZETA_EULERMACLAURIN_SERIES_ORDER + 1. This causes an out-of-bounds access on the last quoted line above.

Notice the <= in the for loop termination condition.

jgmbenoit commented 2 years ago

Okay, I see now.

jgmbenoit commented 2 years ago

Just to be sure: do you have a record of the crash ?

jgmbenoit commented 2 years ago

I am on my way to submit a patch.

szhorvat commented 2 years ago

I linked one from the top post: https://github.com/igraph/igraph/issues/1864#issuecomment-983847407 This is likely due to passing in a NaN or Inf. It was due to a now-fixed bug in plfit.

jgmbenoit commented 2 years ago

Okay. I need some rest apparently. Whatever. I have just submitted a fix: #40 Let me know if it is fine.

ntamas commented 2 years ago

Closed in #40