mvcisback / SSLVC

Sound Source Localization using Visual Cues
4 stars 1 forks source link

Project Proposal #1

Closed mvcisback closed 10 years ago

mvcisback commented 10 years ago

Figured I'd open up an issue for discussing details about the project proposal as well as brain storming ideas.

mvcisback commented 10 years ago

Some notes from today's meeting:

Meetings with be Thursdays: 2:30pm (we should have about 5-6 more meetings)

Motivating problem

Scenarios

  1. Stationary speaker with no background movement
  2. Non-stationary person speaking with no background movement
  3. Stationary speaker with background movement
  4. 2 Stationary speakers with no background movement
  5. 2 Non-stationary speakers with no background movement

Milestone 1: Focus on scenario 1

Detecting Speech Frames

Given:

Then the speech corresponds to activation in the spectrogram in the range R.

Detecting Speech in Images

Removing Brightness Fluctuations

Space concerns

If done naively, the time series is far too large to process.

  1. Compress images using external software
    • ffmpeg or movie maker
  2. Then analysis the video

Milestone 2

  1. Learn models correlating sound with type of image
    • Could be done live using history of video
      • Look for sounds that are more correlated with certain movements based on history of video
    • Preprocess dictionary of sounds.
  2. Classify current environment based on models
    • What is the probability that given this sound, A given set of features emitted the sound
    • Dictionary will give likelihoods and priors (we're doing a video conference so its likely that the source is a human)
mvcisback commented 10 years ago

Was it just me, or did today's lecture seem particularly useful for our problem (although that seems to happen every week)?

Particularly, PLCA seems to allow us to do an NMF like factorization, but can easily include priors about what we think we should be looking for (as in Milestone 2)

ghost commented 10 years ago

I was actually going to say the same thing. It does seem a bit complicated though. We might want to change the cases of the project, instead of making the scene more complicated, apply more sophisticated ways of doing the same thing.

mvcisback commented 10 years ago

@ramili perhaps the we should always try all 5 scenarios with the knowledge that the simpler techniques will fail, with an emphasis on the 1st scenario? Once we have the code to test its once, it shouldn't be hard to test it on the other scenarios. Analysis time does increase, but I think its manageable given we are 3 people.

ghost commented 10 years ago

Sure, that works too. Since we are three people, after we worked out the first scenario, we can have someone working the new cases and someone else applying new methods?

mvcisback commented 10 years ago

That works. I think it'll build out pretty easily that way.

mvcisback commented 10 years ago

@ramili posted in #10

So, we need to write up an actual proposal and submit it with the next homework. If you guys want to each write up a draft of what you think we should be doing it with the project, I'll try to put them together and submit it with the next homework. It would be nice if each person has a different responsibility than others, but that can wait until we have a a proposal. I like to apply spatial responses to the results of our localization, just so we have something nice to see and "hear" as well.

P.S. I hope I'm not spamming your inbox. I wonder if there is a way to control how many notifications you're getting."

mvcisback commented 10 years ago

+1 on the spatial response applications. I think the motivating use cases are always important to keep in mind when writing the paper.

I'll try and make a very rough draft by Tuesday highlighting things I'd like to see.

ghost commented 10 years ago

Super! (yeah I was and am obsessed with 3D audio! )

mvcisback commented 10 years ago

Hi, as promised I checked in a rough rough draft of a proposal.

ffaghri1 commented 10 years ago

Thanks Marcell, it is a great start.

One comment I have is, maybe it's good to start thinking about which milestones are straightforward and reachable this semester and what are the extensions that could be considered as novel contributions.

Maybe we can discuss it here or in another thread. On Oct 29, 2014 1:29 AM, "Marcell Vazquez-Chanlatte" < notifications@github.com> wrote:

Hi, as promised I checked in a rough rough draft of a proposal https://github.com/mvcisback/SSLVC/blob/master/proposals/proposal_marcell.pdf?raw=true .

Reply to this email directly or view it on GitHub https://github.com/mvcisback/SSLVC/issues/1#issuecomment-60879754.

mvcisback commented 10 years ago

Ah, I tried to make the milestones in increasing order of difficulty. I agree about thinking about what's novel.

ghost commented 10 years ago

Looks good, have all the information we discussed so far. I'll probably reorder and revise it and post it back by latest Saturday. I like the idea of working on NMF and PLCA in parallel with the project, but I think the core of the project should be using the simple ways we discussed last week and then we say "we will also discuss the state of the art of doing the same thing in the final report ".

P.S. I think, For time warping you assum there is an audio sync with the video, a reference, and then you replace it with a better recording by stretching and compressing the time waveform with respect to that reference, it won't be useful for conference calling. There is actually a nice PLCA way of doing that, I think call Hashing(?) for syncing sensors information and fusion.

On Wednesday, October 29, 2014, Marcell Vazquez-Chanlatte < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Ah, I tried to make the milestones in increasing order of difficulty. I agree about thinking about what's novel.

— Reply to this email directly or view it on GitHub https://github.com/mvcisback/SSLVC/issues/1#issuecomment-60882534.

Thanks, Best Regards, Ramin

ghost commented 10 years ago

Found the paper!

http://web.engr.illinois.edu/~paris/pubs/bryan-icassp2012.pdf

On Wed, Oct 29, 2014 at 10:47 AM, Ramin Anushiravani anushir2@illinois.edu wrote:

Looks good, have all the information we discussed so far. I'll probably reorder and revise it and post it back by latest Saturday. I like the idea of working on NMF and PLCA in parallel with the project, but I think the core of the project should be using the simple ways we discussed last week and then we say "we will also discuss the state of the art of doing the same thing in the final report ".

P.S. I think, For time warping you assum there is an audio sync with the video, a reference, and then you replace it with a better recording by stretching and compressing the time waveform with respect to that reference, it won't be useful for conference calling. There is actually a nice PLCA way of doing that, I think call Hashing(?) for syncing sensors information and fusion.

On Wednesday, October 29, 2014, Marcell Vazquez-Chanlatte < notifications@github.com> wrote:

Ah, I tried to make the milestones in increasing order of difficulty. I agree about thinking about what's novel.

— Reply to this email directly or view it on GitHub https://github.com/mvcisback/SSLVC/issues/1#issuecomment-60882534.

Thanks, Best Regards, Ramin

Thanks, Best Regards, Ramin

mvcisback commented 10 years ago

Submitted