Open wumaster opened 9 months ago
Can you explain a bit more here?
Can you explain a bit more here?
Hi, very happy to get your reply. I have a lot of interests on your work and want to use your plc and fec methods, and recently I am studying your code of opus-ng. I have tested three plc methods: silk plc, lpcnet plc, fargan plc. It seems that fargan plc generates more artifacts than silk and lpcnet plc. And I also tested them using subjective tests and pesq score, the results showed that plc method using lpcnet has a better quality than others, and plc method using fargan sometimes gets a worse result than silk plc (more sound artifacts). I guess that the fargan has a worse audio quality than lpcnet as a vocoder.
Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?
Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?
Thank you a lot for the reply. Please wait a moment and let me prepare these materials.
Actually, fargan as a vocoder gives better quality than LPCNet. Can you provide the two commits you're comparing, what command line you're using, along with the input and output files so that we can reproduce what you're getting?
The lpcnet test branch: https://github.com/xiph/opus/tree/neural_plc. I have modified opus_demo.c to make it support lost file input. Base commit is 4e46ccd68642ce3f29885eb9b1a64fac5e392291. The command line is: ./opus_demo voip 16000 1 64000 -use_lost_file -complexity 5 arctic_a0023_16k.pcm out_plc.pcm arctic_a0023_16k_is_lost.txt
The fargan test branch: https://github.com/xiph/opus/tree/opus-ng. Build option is ./configure --enable-deep-plc
. Base commit is 591c8bad70d8aa414729d1a243a6d930f64d6316. The command line is: ./opus_demo voip 16000 1 64000 -lossfile arctic_a0023_16k_is_lost.txt -dec_complexity 10 -complexity 5 arctic_a0023_16k.pcm out_plc.pcm
. I use enc complexity 5 to make sure only the silk encoder/decoder works.
The results shows that the fargan plc generates more audio data than silk or lpcnet plc, but it could generate more artifacts than other plc methods.
test_and_res_pcm.zip
In 2.0s/ 3.7s/ 5.5s... , the fargan plc generates more signals with pitch, but the original signal is not pitch signal. This I think we can solve it using silk plc instead of fargan plc when dealing with lost signal of TYPE_UNVOICED and TYPE_NO_VOICE_ACTIVITY.
In 3.0s, the fargan plc generates some artifacts, others would generate artifacts too. But the artifacts is easier to hear than other plc methods. This makes the plc method sometimes get worse subjective test scores.
We tested many files, it seems that above problems would also occur in other files.
I also tested the 2022 PLC challenge test database using clean signal and loss file. The results shows that lpcnet plc get a higher PLCmos score.
There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?
There's hundreds of changes between the two points you're comparing (not just switching from LPCNet to FARGAN). Are you able to narrow it down further?
Sorry, I have been learning your code just for a short time, and for now I can't figure out the details between the two plc algorithms. I just tested your two plc algorithms, and the results just showed that the fargan plc sometimes get worse results both in PLCMOS and our subjective tests. Just a polite question, I would like to ask your research team's test results between the two plc algorithms. Here is the clean speech, lostfile and plc results of plc challenge test data(54.wav), in the subjective tests, the fargan plc get worse results. The command line used is same as mentioned above. plc-challenge-54.zip Recently I'm trying to figure out what causes the differences.
I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.
I was just saying that if you have some time it may be useful to look at intermediate versions between the two you tested. There have been many more changes between the two, including a different pitch estimator, a smaller feature predictor, etc. In terms of objective results, we don't use PLCMOS as we've seen it to be unreliable in the past. I'll still see if I can find anything.
OK, thanks a lot. I need to take more time to look into some details between the two. In my test, the fargan plc sometimes generate more artifacts (more harmonic noise) than silk or lpcnet plc. I think the decoder information such as signal type can help fargan to generate less artifacts.
If you want to see just the effect of FARGAN, you could test commit d1c5b32ad, which is just before FARGAN got added.
If you want to see just the effect of FARGAN, you could test commit d1c5b32, which is just before FARGAN got added.
Thanks a lot!
I did some investigation and found some commits where I think there is regression. I just did subjective listening to the arctic_a0023_16k.pcm
example. On the opus-ng branch, the original LPCNet PLC is at https://github.com/xiph/opus/commit/4414db0.
First potential regression is seen at https://github.com/xiph/opus/commit/2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.
Next potential regression is https://github.com/xiph/opus/commit/f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.
All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between https://github.com/xiph/opus/commit/f0ec990 and https://github.com/xiph/opus/commit/591c8ba, but I haven't tracked that down yet.
There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.
I did some investigation and found some commits where I think there is regression. I just did subjective listening to the
arctic_a0023_16k.pcm
example. On the opus-ng branch, the original LPCNet PLC is at 4414db0.First potential regression is seen at 2d98ced. I notice that some of the PLC includes a bit more pitched content mixed in. I think it actually sounds fine but it is a change. I didn't run PESQ or PLCMOS on this.
Next potential regression is f0ec990. Here there are some strange choices of pitch, and again the pitched (voiced) segments are louder.
All of these predate the changeover to FARGAN. There is an addition possible regression that happens somewhere between f0ec990 and 591c8ba, but I haven't tracked that down yet.
There were changes to the PLC predictor and pitch models prior to the switch to FARGAN, so we're going to be looking at these as well as other possible root causes.
thanks!
Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?
Still looking into this, but can you give the exp_plc_fix1 branch (commit c1b80a7) a try and let me know?
OK, I'm a little busy these days, I'll test it soon
Well, you can now compare to the latest commit on opus-ng, which has the changes from exp_plc_fix1 and more
I just test the new commit, it seems that the pitch-liked content decreased, but still has the problem. test_and_res_pcm-1_22.zip It seems that the network judged a wrong signal type, the lpcnet and silk plc get the correct signal type.
May I inquire if there are any papers available that provide an introduction to FarGan?
There's no paper on FARGAN -- yet.
So one of the things in the new PLC that are known to be a bit worse is that for complexity reasons, the context is no longer updated when there's no loss, only the most recent history. You could still try increasing the size of that history buffer to make it more similar to the old behaviour. It's easy to do by editing the dnn/lpcnet_private.h file and changing this line:
You can change the "+5" into "+100" and see what happens.
Increased to +10 seems to fix other cases where I've seen problems. See if there's any issue now.
hi, I have a question about the details of the Fargan inference code. It seems that the output waveform does not center around the input features, which is different from the description in the LPCNet paper. I am wondering whether the input feature is centered on the frame when training, and if yes, will the mismatch affect inference performance?
hi, I have tested neural plc using different nn model. opus-ng deep plc seems to have a worse plc audio quality than opus lpcnet plc. How can I increase the plc quality?