YenJu Lu - Githubissues

smlcitias commented 2 years ago

April 15th

Interspeech 2020 ESPnet-SE++
- Created pull requests in ESPnet
- https://github.com/espnet/espnet/pull/4264
- https://github.com/espnet/espnet/pull/4268
- https://github.com/espnet/espnet/pull/4269
- CHiME4 results: overfitting in iNeube SIMU REAL
  
  No processing 19.7 18.0
  
  DCCRN 16.3 15.8
  
  Beamformer 10.8 13.7
  
  iNeube (DNN1) 9.0 35.8
- Joint-training for iNeube and ASR/ST/SLU model
Preparing ICASSP videos
- Conditional Diffusion Probabilistic Model for Speech Enhancement
- iNeuBe: Towards Low-distortion Multi-channel Speech Enhancement
Universal speech separation and enhancement (SSE) model
- Build a 2-stage system
- stage1: separate noise from the speech
- stage2: separate speech from the mixture

CHiME4 results: overfitting in iNeube		SIMU	REAL
No processing	19.7	18.0
DCCRN	16.3	15.8
Beamformer	10.8	13.7
iNeube (DNN1)	9.0	35.8

[ ] Change conditioners from spectrum to SSL features.
Pull Requests in ESPnet
- [ ] SLURP-S: preparing se-slu recipe based on the joint model PR.

neillu23 commented 2 years ago

April 29th

Interspeech 2022 ESPnet-SE++:
- Upload SpatializedLibriTrans and SpatializedSLURP under 140.109.21.234:/volume7/homes/neillu
- https://github.com/espnet/espnet/pull/4268
- Added SLURP_S mixture SLU recipe
- To upload models for evaluating the results from SE
BPC-guide-SE
- Todo:
- Add spectrogram for the dereverberant experiments.
- Modify the context about introducing the E2E-ASR and ESPnet.
Universal speech separation and enhancement (SSE) model
- One system for enhancement and separation.
- stage1: separate noise from the speech
- stage2: separate speech from the mixture
- Started with WHAM dataset
- SE model (DCCRN/ConvTasNet/CDiffuSE):
  - mix_both (s1+s2+n) => mix_clean (s1+s2)
  - mix_single (s1+n) => s1
- OR-PIT SS model (TasNet):
  - mix_both/ mix_both_enhanced => (s1, s2) or (s2, s1) (OR-PIT is same as uPIT for two speakers)
  - mix_three => (s1, s2+s3) or …
- Implementing OR-PIT from the uPIT code.

[ ] Change conditioners in CDiffuSE from spectrum to SSL features (HuBert, WavLM).
Pull Requests in ESPnet
- [ ] SLURP-S: preparing se-slu recipe based on the joint model PR.

neillu23 commented 2 years ago

May. 6th

Universal SSE model

First dataset: WHAM 2 speakers mixture with noise and single speaker with noise.
SE model: Denoise for two speakers mixture and one speaker mixture.

DCCRN model	testing data	training data	STOI	SAR	SDR
single speaker	single speaker	0.92	11.98	11.98	11.42
single speaker	single and 2 speakers	0.92	12.01	12.01	11.49
2 speaker mixture	single speaker	0.84	7.30	7.30	6.51
2 speaker mixture	single and 2 speakers	0.91	10.76	10.76	10.35

SS model:
- ConvTasnet model from wsj0-2mix approach STOI SAR SDR SIR SI_SNR
  
  SE + SS 0.82 8.37 7.57 19.53 6.93
  
  SS 0.69 0.60 -0.94 10.30 -1.39
ConvTasnet model from WHAM dataset STOI SAR SDR SIR SI_SNR

SE + SS 0.84 9.35 8.76 21.62 8.03

SS 0.87 9.89 9.45 23.25 8.82

ConvTasnet model from wsj0-2mix	approach	STOI	SAR	SDR	SIR	SI_SNR
SE + SS	0.82	8.37	7.57	19.53	6.93
SS	0.69	0.60	-0.94	10.30	-1.39

ConvTasnet model from WHAM	dataset	STOI	SAR	SDR	SIR	SI_SNR
SE + SS	0.84	9.35	8.76	21.62	8.03
SS	0.87	9.89	9.45	23.25	8.82

neillu23 commented 2 years ago

May. 20th

Data with different geometry parameters. (Librispeech)
- New multi-channel simulation data
- Dynamic Mixing
Speech Enhancement:
- Apply DCCRN to predict clean speech for each channel.
Speech Separation:
- To train the model with 1 to 4 channels input mixture.
- Multi-/Single-channel speech processing:
  - Option1: TAC supports multi-/single-channel
    - TAC-DPRNN predicts the clean speech from different numbers of mixture channels.
  - Option2: Beamforming (e.g., MVDR)
    - NN estimates mask and Invariant to the configuration
- Plan to build OR-PIT recursive procedure in ESPnet-se
  - enh/orpit_espnet_model.py
  - tasks/enh_orpit.py
  - bin/enh_orpit.py, bin/enh_orpit_inference.py
- Target speech extraction
  - Option1: Speaker ID information
  - Option2: Iteratively applied separation and speaker feature extractor (e.g., WavLM).

neillu23 commented 2 years ago

June. 10th

Reviewed 3 papers
Modified the TASLP paper
- Described ESPnet as a toolkit.
- Added the impaired speech experiments.
Upload models and dataset to Hugging Face
- espnet/Yen-Ju_Lu_spatilaizedslurp_asr_train_asr_conformer_transformer_valid.acc.best
- espnet/Libri-Trans-Spatialized_SLURP-Spatialized_dataset
Implementing Recursive Selective Hearing Network in ESPnet
- Add detector for predicting the stop signal after the separation.
- Add recursive flow in ESPnet-SE
Training models with the L3DAS22 recipe.
- Fasnet

smlcitias / lab_activities