Training list only contains 270k+ lines

joonson / syncnet_trainer

Disentangled Speech Embeddings using Cross-Modal Self-Supervision

MIT License

154 stars 26 forks source link

Open jlian2 opened 4 years ago

jlian2 commented 4 years ago

Most of video-audio pairs are not synchronized so they are ignored. But you mentioned there should be 1000k+ lines?