hangzhaomit / Sound-of-Pixels

Codebase for ECCV18 "The Sound of Pixels"
http://sound-of-pixels.csail.mit.edu
MIT License
371 stars 74 forks source link

Poor visualizations, getting zero SDR, SIR, etc. on evaluation #7

Closed deepakee13 closed 5 years ago

deepakee13 commented 5 years ago

I was trying to evaluate on 16 videos using downloaded trained model but I am unable to see the results in visualization. Video1 and video2 have only 3 frames each with no audio and predicted audio are also silent.

I'm getting the following output after evaluation:

Loading weights for net_frame Loading weights for net_synthesizer samples: 6300 samples: 16 1 Epoch = 196 iters Evaluating at 0 epochs... [Eval] iter 0, loss: 0.0115 [Eval Summary] Epoch: 0, Loss: 0.0115, SDR_mixture: 0.0000, SDR: 0.0000, SIR: 0.0000, SAR: 0.0000 Plotting html for visualization... Evaluation Done!

Hope I would get some help Thanks

deepakee13 commented 5 years ago

Problem got solved

deepakee13 commented 5 years ago

I removed normalization coefficient of 2^-31 from raw audio and took transpose of it which was probably in base.py and then it worked fine.

........................ Deepak Kumar

On Wed 12 Jun, 2019, 7:59 AM yifan254, notifications@github.com wrote:

hi, deepakee13, I got the same problem as you, so can you give me some ideas on how to solve this problem? thanks~

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hangzhaomit/Sound-of-Pixels/issues/7?email_source=notifications&email_token=AGYHMBXCGBKBT6FO5UJKM53P2BNO7A5CNFSM4HMTSLQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXPB7AA#issuecomment-501096320, or mute the thread https://github.com/notifications/unsubscribe-auth/AGYHMBSKXC7DDPFOWI5YMBDP2BNO7ANCNFSM4HMTSLQA .

JusperLee commented 4 years ago

hello, I want to know what can be calculated by modifying that part in base.py.

deepakee13 commented 4 years ago

See, In my case, I was using pretrained model and validating the dataset on it. But as I said above in the original code inside the function _load_audio_file(), they were normalizing by a factor of 2^-31, which in my case made raw audio to zero, by removing that part, i was able to get some result and i also recollect that i had to take transpose of raw audio while converting to mono. Otherwise you can remove the If loop for handling the .mp3 format and use librosa or other library to load the audio file.

JusperLee commented 4 years ago

I modified the code to make it work, and then I got very unsatisfactory results. The separation effect is also relatively poor.

deepakee13 commented 4 years ago

I saw your commit in the code, I did the same. but i used pretrained model and result was good enough in case of validation on MUSIC dataset videos ( the reason might be the model are already trained on those videos) . But I was also getting very poor result on inference.

JusperLee commented 4 years ago

I did not use a pre-trained model. I trained from scratch and found that the results of the training were not very good.

parth1497 commented 4 years ago

@deepakee13 can you share the your code snippet after you made the correction to the function? (The transpose part specifically) Thanks!

parth1497 commented 4 years ago

@JusperLee If you can also provide any input then it will be great. Thanks in advance.

JusperLee commented 4 years ago

@parth1497 #9