Sound - Githubissues

cdsmith commented 9 years ago

We could support programmatic generation of sound by using a function Number -> Number (time to amplitude sample). The sampling frequency would need to be FAR higher than the animation frequency, so we'd batch it and use a buffer.

Problems:

The JavaScript audio API doesn't work with Internet Explorer (possibly other browsers).
Performance would be a challenge, and we need to make sure this can be done efficiently to avoid stuttering, even on slow computers.
It's not clear how to merge this into games. The obvious way would be

audibleInteractionOf(initial, step, event, draw, sound)

But this ties the audio sample rate to the frame rate, which is bad. Another way would be to maintain a different World type for sound, but this is too complicated. Definitely could use some thought.

We'd need a reasonable kind of abstraction that doesn't involve new ADTs with complicated behaviors. Working directly with wave forms might be an educational experience for a bit, but it shouldn't be unreasonably hard to just choose a timbre and play a melody.

cdsmith commented 9 years ago

Another proposal, hatched from a discussion with someone last week. Build in a fixed list of sound forms (like data Timbre = Beep | Buzz | Piano | Guitar | ...), and define data Sound = Sound(Timbre, Amplitude, Pitch). Then define sounds :: World -> [Sound]. For a Timbre with no attack or decay, this will work easily. For something like Piano or Guitar with an attack/decay, we'd make a rule that if exactly the same Sound occurs in consecutive frames, the previous sound keeps playing.

This still doesn't let a student just turn a list of notes into background music... but, it's not too far off. You could say:

sounds(w)
  | k <= 1 = [ Sound(Piano, 1, bFlat) ]
  | k <= 2 = [ Sound(Piano, 0.7, cSharp) ]
  where k = remainder(t, 3)

cdsmith commented 7 years ago

I had a great discussion about this with a few people today. We had the following insights:

To avoid the combinatorial explosion going on with features, we should consider a switch to a new API based on an abstract Application type. This is a separate proposal at #492, but is a prerequisite for this feature.
The key insight here is that sounds are normally not so much a function of the state, as they are of the edge, or change, in state. So rather than wanting a function like state -> Sound, we actually want functions like (state, Event) -> Sound and (state, Time) -> Sound, analogous to the step and event handler functions on state that exist now. Once a sound is triggered, it plays to completion on its own. This is a little weird, but playing around with a few real-world uses of state -> Sound consistently led to just adding a sound-effect queue to the state, so that it was possible to add effects from the step and event functions, and trigger them exactly once.
There are combinators on sounds, just as there are combinators on pictures. Combinators can do things like adjust the pitch, volume, or duration, invert the pitch, and even combine sounds by playing them either in sequence, or in parallel. Then the combined sound can be triggered entirely as one effect.

cdsmith commented 6 years ago

https://www.youtube.com/watch?v=Vj8h5w4yfW8 and https://github.com/lnfiniteMonkeys/TimeLines are great inspirations here. Forgetting the abstraction layer, it might be good enough to just control frequency and amplitude, and maybe some of the other synth parameters seen in that video, per unit time. This certainly lets you sample less frequently than without the fourier transform.

3noch commented 5 years ago

What about playing arbitrary audio files (public URLs of course)? I have a student who would like this feature in any form.

cdsmith commented 5 years ago

I've still not found an API that I think really fits. Here's what would work, though. I want to change the core of codeworld-api to be based on reflex, rather than a hand-coded event loop. Once that happens, we'll have a sufficiently abstract API that adding something like sound will be easy to do. That gets sound into codeworld-api, at the cost that you have to use a full FRP implementation instead of one of the simplified API variants.

Adding sound to codeworld-base is harder, because I'm just not convinced that any of the API options are good enough to commit to.

3noch commented 5 years ago

No sound at all is also an API commitment. ;)

With audio files it would be nice to have events for "audio finished playing." I'm guessing that's where you see the value of reflex. Lack of IE support isn't an issue in my mind. No audio on IE is no worse than it is now.

cdsmith commented 5 years ago

So leaving aside implementation, what would be your ideal API for codeworld-base programs with sounds?

The best I can come up with is something like:

data Sound
type Pitch = ...  -- What?  Number like 440?  Text like "A4"?  Abstract type with yet more API?

bell :: Pitch -> Sound
beep :: Pitch -> Sound
piano :: Pitch -> Sound
click :: Sound
...
soundAtURL :: Text -> Sound  -- maybe?

loud :: Sound -> Sound
quiet :: Sound -> Sound
long :: Sound -> Sound
short :: Sound -> Sound
withDuration :: (Sound, Number) -> Sound  -- ugly name!
transposed :: (Sound, Number) -> Sound

chord :: [Sound] -> Sound
melody :: [Sound] -> Sound

someNewEntryPoint ::
    ([Number] -> state, (state, Event) -> (state, [Sound]), state -> Picture) -> Program

This doesn't deliver events when the sound stops, or allow a sound to be cancelled, or anything like that. It's purely fire-and-forget.

There's a lot I like about this API. The notion of sound as first-class composable values, and the melody and chord functions, have a kind of beauty to them. But it's also pretty complicated. Having finally reduced the number of non-deprecated entry points to five, this would now add a fifth, sixth, and seventh (plain, multiplayer, and debug variants), and then potentially a new deprecation cycle for the non-sound APIs. What I like even less is that it's now mixing presentation and logic in the change function. (But it doesn't work to put the sounds in the picture function, because sounds need to be triggered at specific times, in concert with the state is changing.) And what I like least of all is that when playing around with programming in this model, you spend a LOT of time on plumbing to concatenate all those sound lists from various functions called by change.

Do you have a better alternative for codeworld-base?

In any case, Ryan and I have plans to work on a Reflex core for codeworld-api this coming weekend at the New York cohack. At that point, I think it's reasonable to at least add sound in the base FRP model. Then we can figure out how far it makes sense to add to the simpler levels.

3noch commented 5 years ago

That sounds awesome. I hope it goes well!

The API you propose does seem elegant and yet entry-point explosion would not be fun.

I've thought about somehow including sound into Picture as you say. I'm not 100% sold that sound really does need to be event-based. I can sort of imagine a purely state-based API for it:

sound :: Sound -> Picture
audioURL :: URL -> AudioClip -- would be nice if URL could be statically determined so audio could be loaded before program starts.
audioClip :: AudioClip -> Sound
duration :: AudioClip -> Number
sounds :: [Sound] -> Sound

data MySound = Start | Explode | GameOver
type SoundState = [(MySound, Number)]
data State = State
  { stateSoundState :: SoundState
  , stateTime :: Number
  , stateMore :: MoreThings
  }

mySoundToClip :: MySound -> AudioClip

step :: Number -> State -> State
step t state = state {stateSoundState = soundState, stateTime = t }
  where soundState = filter (\(clip, startedAt) -> t - startedAt < duration(mySoundToClip(clip))) (stateSoundState state)

change :: Event -> State
change state = if bombExploded (stateThings state) then state { stateSoundState = [(Explode, stateTime state)]}

Obviously a lot of book-keeping going on...but that could all be abstracted away with normal Prelude functions.

3noch commented 5 years ago

Oh, sorry I forgot to clarify that this would assume, as you suggested, that the same AudioClip in back-to-back frames would mean "keep playing". AudioClips would produce silence after completion.

This still doesn't allow things like pause though. I feel like that could be accomplished though somehow..

3noch commented 5 years ago

This relies rather heavily on some notion of equality on AudioClip. That is easy. But extending that to Sound would be much harder. In that case it seems it would make more sense to build on top of some user-defined "key" which specifies equality (I hinted at this with MySound type already). Then you could create complex sequences of tones, notes, audio clips, etc. and work with them based on this same notion.

cdsmith commented 5 years ago

@3noch Sorry, but I don't understand your proposed API. Which parts of it would be provided by CodeWorld, and which are part of the student code? I assume you're not proposing that the standard library would define something called MySound...

3noch commented 5 years ago

I could have been clearer. I'm building off of what you've already proposed, more-or-less:

-- Integrates sounds into the existing `Picture` type. Anything that can be drawn can also produce sound.
sound :: Sound -> Picture

-- Do whatever you like with tones, timbres, etc. as you've proposed.

-- Extend to include audio clips:
audioURL :: URL -> AudioClip
audioClip :: AudioClip -> Sound
duration :: AudioClip -> Number

draw :: State -> Picture
draw state = circle(1) & sound(sequence(repeat(audioClip(audioURL("http://wikipedia.com/example.mp3")))))

Obviously the question then becomes, how would you actually use this? That's where I try to show an example of how you might actually build an application that would work.

For audio-clips, the underlying engine would keep a list of all the clips that were drawn and do a "diff" between frames to decide which ones should be stopped and which ones should keep playing. For more abstract sounds, you could introduce some way to key a sound so that the diffing algorithm can decide if it's the same sound as the last frame or not.

3noch commented 5 years ago

I'm not thrilled with this of course because, what does translated(sound(...), 1, 1) mean? It would most likely need to have no effect, but it's still weird. If Picture were made into Output and split into two pieces, you could do this more naturally, with extra complication to backwards compatibility.

3noch commented 5 years ago

The "key"ing idea is really that the engine would keep [(String, Sound)] and diff these between frames. When a new key (fst) arrives, you start a new channel, and begin playing snd on it. When the key is gone, you stop the sound. This would allow you to easily sequence sounds and play the whole sequence at some point in time.

3noch commented 5 years ago

I'm not attached to this. It's just an idea to see if we can avoid the entry-point bloat...

cdsmith commented 5 years ago

Okay, I understand the idea. I agree that it seems workable, but I'm worried that in the end, this is a case where backward compatibility is not the right goal, because it will make the abstractions wrong. I'd rather break everyone's programs tomorrow than spend the next 5 years explaining to students why sound is part of a Picture value. As far as avoiding entry point bloat, if we abandon backward compatibility, there are always options like #492 instead.

I discussed this with both Donya Quick and Leon Smith at Compose today. In all cases, we all ended up agreeing that in the common case, sound is triggered based on the edge, not the level (i.e., by a change in state, not a current state). We saw a few reasonable designs:

change :: (state, Event) -> (state, [Sound]). This is the obvious simple change, but it mixes logic and presentation.
change :: (state, Event) -> state and sound :: (state, Event) -> [Sound]. On its face, more appealing. However, in practice, producing the list of sound is actually closely tied to determining the new state. For instance, when you realize the bomb hit the alien, you want to simultaneously remove the alien and trigger the explosion sound. You don't want to do one without the other at the same time: if the alien is removed without triggering the sound yet, you won't know to trigger the sound later; but if the sound is triggered without removing the alien, you won't know NOT to trigger the sound next frame.
change :: (state, Event) -> state and sound :: (state, state) -> [Sound]. The theory here is that if sounds are a property of the change in state, then the transition could be represented by passing both states. However, this loses information about WHY the change happened. i.e., it's not enough to know the alien was there and is now gone. You need to know whether the alien was hit by a bomb, or just wandered off the screen. That might depend on the event, not just the state.
change :: (state, Event) -> state and sound :: (state, Event, state) -> [Sound]. This is ultimately just a redundant parameter. If you have the state and event, then you can already apply change to determine the next state. So this is just an optimization, and doesn't help the semantics.
change :: (state, Event) -> state and sound :: state -> (state, [Sound]). Here, the idea is that sound does a fetch-and-clear, and it's the responsibility of the student to coordinate when to trigger sounds and how to update state so they aren't triggered twice. In practice, this might mean that many programs will maintain a buffer of pending sounds in their state type, but that's not the end of the world.

I think options 1 and 5 are both not-awful.

The one thing that's made substantially harder by this model is playing background music that changes for each level, or something like that. Maybe that's okay... there are work-arounds, but per-level background music isn't really the most important use case.

3noch commented 5 years ago

The one thing that's made substantially harder by this model is playing background music that changes for each level, or something like that.

For reasons like this, I'm actually still convinced that fundamentally basing sound off events is the wrong move. You can always write state-transition detection in your own code. It's kind of a pain, yes, but you can do it. And what's more, you can write functions that do it for you. If you do that, then sound is only a function of state, just like the screen. I think of "sound" as "presentation." It's just another way for the user to experience the state.

because it will make the abstractions wrong

This is undeniably true. Adding sound :: Sound -> Picture is just asking for pain. What I think would be better, as I briefly mentioned in passing, is building a new type called Output (better name please!) and making it contain both a Picture and Sound. This breaks backwards compatibility of course, but in a somewhat minimal way. For example: instead of animationOf(\t -> circle(t)) you'd have animationOf(\t -> output(circle(t), noSound)).

3noch commented 5 years ago

If we had full FRP then using events would be fine, because you could define your input event to be derived from Dynamic State in addition to other things. But without this, I fear that using events will force you to do awkward things to create sounds that aren't event based. I suppose you can rely on the TimePassing event but that would be "hacky." But keeping your event-based sounds and your state in sync is still going to require book-keeping, just in the other direction.

If you only used state, here's how you could write the "bomb hit alien" example:

step state t = dropSoundsThatAreOver t $ detectHits (stateAliens state) state
  where
    detectHits [] state' = state'
    detectHits (alien:aliens) state' = detectHits aliens $ if bombTouching(alien)
      then state' {
          stateAliens = filter (/= alien) (stateAliens state'),
          stateSounds = (bombSound, t + duration(bombSound)) : stateSounds state'
        }
      else state'

EDIT: Fixed some logic, probably still not quite right...

Where (bombSound, t + duration(bombSound)) means "A sound and the time at which it should be removed" and dropSoundsThatAreOver t is a function that just removes sounds whose end time is before t.

The way to make this easy is to write abstractions around it so that students don't have to rewrite this basic "has a sound stopped, etc" logic every time.

Perhaps the complexity of the abstractions themselves are too high for the target audience of Code.World. I can definitely sympathize with that. But even then, it seems hard-coding the choice is still less ideal. You can simply bind the existing entry points to functions that provide the abstractions automatically. For more advanced students, they can use the underlying abstractions directly.

3noch commented 5 years ago

Another mild advantage is that things like animationOf still work almost exactly the same but can also produce sound, even though there is no state or events.

google / codeworld

Sound #47