justinsalamon / scaper

A library for soundscape synthesis and augmentation
BSD 3-Clause "New" or "Revised" License
379 stars 55 forks source link

Manage co-occurence of events #142

Open turpaultn opened 3 years ago

turpaultn commented 3 years ago

Would it be possible to manage the co-occurence of events ?

The idea I used to generate desed dataset was using the parameter "p" of np.random.choice to have "probas", so it is quite simple and everything is managed only depending on the first event sampled (which defines the co_occur_params dictionnary to use, because it is specific to an event):

def choose_cooccurence_class(co_occur_params, random_state=None):
    """ Choose another class given a dictionary of parameters (from an already specified class).
    Args:
        co_occur_params: dict, define the parameters of co-occurence of classes
            Example of co_occur_params dictionnary::
                {
                  "max_events": 13,
                  "classes": [
                    "Alarm_bell_ringing",
                    "Dog",
                  ],
                  "probas": [
                    70,
                    30
                  ]
                }
            classes and probas maps each others
        random_state: int, or RandomS0tate object
    Returns:
        str, the class name.
    """
    if random_state is not None:
        random_state = _check_random_state(random_state)
        chosen_class = random_state.choice(co_occur_params['classes'], p=co_occur_params['probas'])
    else:
        chosen_class = np.random.choice(co_occur_params['classes'], p=co_occur_params['probas'])
    return chosen_class

(the max_events is used to determine a random "number of events" in the soundscape depending on the class of the first event sampled once again, so not very good, but easy to make and at least class dependent)

This is very simplistic code. But a goal could be to have a better co-occurence sampling (n-gram or other ideas inspired from generation of text from language model I guess ?), what do you think ?

justinsalamon commented 3 years ago

Cheers @turpaultn !

We could definitely add support for non-uniform discrete sampling, e.g. via a new choose_weighted distribution tuple.

IIUC in the example above you're providing the probability for each event being chosen, and then choosing one of these events, but that's not the same as co-occurrence probabilities, right? That is, it's different to say

  1. Choose between alarm/dog with prob .7/.3
  2. Give me a soundscape where alarm and dog co-occur with probability X.

My understanding from today's meeting was that the team is interested in the latter, but maybe I misunderstood?

Regardless, it looks like we'd need something like choose_weighted to support Gibbs or related types of sampling methods?

turpaultn commented 3 years ago

Cool !

Well, I understand it's not clear, because I've put this little piece of code.

But the algorithm is like this:

The idea was that if an alarm ("bip") appeared for example, there is a lot of chance you can hear another one. As I said, it is simple, but at least we were able to have a class balance closer to the real set without going spending too much time.

Regardless, it looks like we'd need something like choose_weighted to support Gibbs or related types of sampling methods?

I agree.