Open turpaultn opened 3 years ago
Cheers @turpaultn !
We could definitely add support for non-uniform discrete sampling, e.g. via a new choose_weighted
distribution tuple.
IIUC in the example above you're providing the probability for each event being chosen, and then choosing one of these events, but that's not the same as co-occurrence probabilities, right? That is, it's different to say
My understanding from today's meeting was that the team is interested in the latter, but maybe I misunderstood?
Regardless, it looks like we'd need something like choose_weighted
to support Gibbs or related types of sampling methods?
Cool !
Well, I understand it's not clear, because I've put this little piece of code.
But the algorithm is like this:
The idea was that if an alarm ("bip") appeared for example, there is a lot of chance you can hear another one. As I said, it is simple, but at least we were able to have a class balance closer to the real set without going spending too much time.
Regardless, it looks like we'd need something like choose_weighted to support Gibbs or related types of sampling methods?
I agree.
Would it be possible to manage the co-occurence of events ?
The idea I used to generate desed dataset was using the parameter "p" of np.random.choice to have "probas", so it is quite simple and everything is managed only depending on the first event sampled (which defines the co_occur_params dictionnary to use, because it is specific to an event):
(the
max_events
is used to determine a random "number of events" in the soundscape depending on the class of the first event sampled once again, so not very good, but easy to make and at least class dependent)This is very simplistic code. But a goal could be to have a better co-occurence sampling (n-gram or other ideas inspired from generation of text from language model I guess ?), what do you think ?