MAINT: optimize._chandrupatla: refactor based on original paper

I did some benchmarking as in https://github.com/scipy/scipy/issues/7242#issuecomment-1365495412. To invert argus.cdf, _chandrupatla takes about twice as long as brent for scalars (~3ms VS ~1.5 ms), but consequently, it starts to become faster for arrays with more than one element. For large arrays, it can be over 350x faster.

```python3 import numpy as np from scipy import stats from scipy.optimize._zeros_py import _chandrupatla dist = stats.argus(1) ps = np.linspace(0.005, 0.995, 100000) # %timeit -n1 -r1 dist.ppf(ps) # 1min 21s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each) [optimize.brentq(lambda x: dist.cdf(x) - p, 0.001, 0.999) for p in ps] # 1min 31s ± 386 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) _chandrupatla(lambda x: dist.cdf(x) - ps, 0.001, 0.999) # 262 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # being careful to get the same accuracy %timeit _chandrupatla(lambda x: dist.cdf(x) - ps, 0.001, 0.999, xatol=0, xrtol=0, fatol=1e-15) # 253 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) optimize.brentq(lambda x: dist.cdf(x) - ps[0], 0.001, 0.999) # 1.44 ms ± 8.68 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) _chandrupatla(lambda x: dist.cdf(x) - ps[0], 0.001, 0.999) # 2.95 ms ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

For comparison, newton is about as fast as brent for scalars, and twice as fast as _chandrupatla for large arrays, but it's unreliable. For the problem above and with a guess of 0.5, it fails to converge in about 5% of cases.

mdhaber / scipy

MAINT: optimize._chandrupatla: refactor based on original paper #102