justinsalamon / scaper

A library for soundscape synthesis and augmentation
BSD 3-Clause "New" or "Revised" License
382 stars 56 forks source link

Scaper breaks with truncnorm in scipy==1.5.1 (latest) #107

Closed pseeth closed 4 years ago

pseeth commented 4 years ago

In the latest Scipy version, Scaper is broken due to an underlying change in the type returned by scipy.stats.truncnorm.rvs. This used to return a float and now returns an array for some reason. Here is a quick example:

❯ pip install -U scipy==1.4.0
Requirement already up-to-date: scipy==1.4.0 in /Users/prem/miniconda3/lib/python3.7/site-packages (1.4.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /Users/prem/miniconda3/lib/python3.7/site-packages (from scipy==1.4.0) (1.19.0)
❯ python
Python 3.7.7 (default, May  6 2020, 04:59:01)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scaper
>>> scaper.util._sample_trunc_norm(1, .1, 0, 5, 0)
1.0122664227574725
>>>
❯ pip install -U scipy
Collecting scipy
  Using cached scipy-1.5.1-cp37-cp37m-macosx_10_9_x86_64.whl (28.7 MB)
Requirement already satisfied, skipping upgrade: numpy>=1.14.5 in /Users/prem/miniconda3/lib/python3.7/site-packages (from scipy) (1.19.0)
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.4.0
    Uninstalling scipy-1.4.0:
      Successfully uninstalled scipy-1.4.0
Successfully installed scipy-1.5.1
❯ python
Python 3.7.7 (default, May  6 2020, 04:59:01)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scaper
>>> scaper.util._sample_trunc_norm(1, .1, 0, 5, 0)
array([1.01226642])
>>>

This causes a lot of downstream errors in formatting in Sox, in the tests that check if things are a real number, etc. Great test suite! The fix should be simple. Convert the output of _sample_trunc_norm in Scaper to return a scalar. I believe this was caused by vectorizing the truncnorm function, though as far as I can tell, the change in return type is somewhat undocumented...

https://github.com/scipy/scipy/issues/11299

I believe the fix should look like this:

def _sample_trunc_norm(mu, sigma, trunc_min, trunc_max, random_state):
    '''
    Return a random value sampled from a truncated normal distribution with
    mean ```mu``` and standard deviation ```sigma``` whose values are limited
    between ```trunc_min``` and ```trunc_max```.

    Parameters
    ----------
    mu : float
        The mean of the truncated normal distribution
    sig : float
        The standard deviation of the truncated normal distribution
    trunc_min : float
        The minimum value allowed for the distribution (lower boundary)
    trunc_max : float
        The maximum value allowed for the distribution (upper boundary)
    random_state : mtrand.RandomState
        RandomState object used to sample from this distribution.

    Returns
    -------
    value : float
        A random value sampled from the truncated normal distribution defined
        by ```mu```, ```sigma```, ```trunc_min``` and ```trunc_max```.

    '''

    # By default truncnorm expects a (lower boundary) and b (upper boundary)
    # values for a standard normal distribution (mu=0, sigma=1), so we need
    # to recompute a and b given the user specified parameters.
    a, b = (trunc_min - mu) / float(sigma), (trunc_max - mu) / float(sigma)
    # old line
    # return scipy.stats.truncnorm.rvs(a, b, mu, sigma, random_state=random_state) 
    # new line
    return scipy.stats.truncnorm.rvs(a, b, mu, sigma, random_state=random_state).item()

.item coerces the array to a scalar. This works locally for me.

justinsalamon commented 4 years ago

Per offline discussion with @pseeth, we need to make sure the fix is backwards compatible with scipy 1.4, i.e. if truncnorm.rvs returns a scalar, the code should still work.

Updated proposed solution: cast to array and then back to scalar

return np.array(scipy.stats.truncnorm.rvs(a, b, mu, sigma, random_state=random_state)).item()
pseeth commented 4 years ago

The linked issue remarks that there may be a speed hit in Scipy 1.5.1. So I just ran the test suite locally and checked the speed.

1.5.1 -> ran in 150.55 seconds
1.4.0 -> ran in 145.13 seconds

Nothing remarkable there. Just logging it here for posterity.

pseeth commented 4 years ago

Updated locally with coercing to array and then back to scalar. Passes tests in 1.4.0 as well as 1.5.1.

.item() exists back to at least numpy 1.13: https://numpy.org/doc/1.13/reference/generated/numpy.ndarray.item.html?highlight=item#numpy.ndarray.item. So we're good there.

Making the PR!

pseeth commented 4 years ago

Closed by #108