joonson / syncnet_trainer

Disentangled Speech Embeddings using Cross-Modal Self-Supervision
MIT License
154 stars 26 forks source link

Training list only contains 270k+ lines #6

Open jlian2 opened 4 years ago

jlian2 commented 4 years ago

Most of video-audio pairs are not synchronized so they are ignored. But you mentioned there should be 1000k+ lines?