Closed deepakee13 closed 5 years ago
Problem got solved
I removed normalization coefficient of 2^-31 from raw audio and took transpose of it which was probably in base.py and then it worked fine.
........................ Deepak Kumar
On Wed 12 Jun, 2019, 7:59 AM yifan254, notifications@github.com wrote:
hi, deepakee13, I got the same problem as you, so can you give me some ideas on how to solve this problem? thanks~
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hangzhaomit/Sound-of-Pixels/issues/7?email_source=notifications&email_token=AGYHMBXCGBKBT6FO5UJKM53P2BNO7A5CNFSM4HMTSLQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXPB7AA#issuecomment-501096320, or mute the thread https://github.com/notifications/unsubscribe-auth/AGYHMBSKXC7DDPFOWI5YMBDP2BNO7ANCNFSM4HMTSLQA .
hello, I want to know what can be calculated by modifying that part in base.py.
See, In my case, I was using pretrained model and validating the dataset on it. But as I said above in the original code inside the function _load_audio_file(), they were normalizing by a factor of 2^-31, which in my case made raw audio to zero, by removing that part, i was able to get some result and i also recollect that i had to take transpose of raw audio while converting to mono. Otherwise you can remove the If loop for handling the .mp3 format and use librosa or other library to load the audio file.
I modified the code to make it work, and then I got very unsatisfactory results. The separation effect is also relatively poor.
I saw your commit in the code, I did the same. but i used pretrained model and result was good enough in case of validation on MUSIC dataset videos ( the reason might be the model are already trained on those videos) . But I was also getting very poor result on inference.
I did not use a pre-trained model. I trained from scratch and found that the results of the training were not very good.
@deepakee13 can you share the your code snippet after you made the correction to the function? (The transpose part specifically) Thanks!
@JusperLee If you can also provide any input then it will be great. Thanks in advance.
@parth1497 #9
I was trying to evaluate on 16 videos using downloaded trained model but I am unable to see the results in visualization. Video1 and video2 have only 3 frames each with no audio and predicted audio are also silent.
I'm getting the following output after evaluation:
Loading weights for net_frame Loading weights for net_synthesizer samples: 6300 samples: 16 1 Epoch = 196 iters Evaluating at 0 epochs... [Eval] iter 0, loss: 0.0115 [Eval Summary] Epoch: 0, Loss: 0.0115, SDR_mixture: 0.0000, SDR: 0.0000, SIR: 0.0000, SAR: 0.0000 Plotting html for visualization... Evaluation Done!
Hope I would get some help Thanks