Open shuttle1987 opened 5 years ago
Is this issue mostly about having English data you understand? I'd say testing with the same Na data that has been tested on Persephone seems like the way to go, since we know what results to expect.
(I'll add that I assumed the Web-API had been tested with all the Na data, but unless I'm now mistaken it looks like I'm wrong to have thought that. This should be high priority).
The main reason for wanting English data is that it is hard to do UX testing for a language you don't understand. The Na data set is good for making sure that functionality works and matches the expected behavior (the issue about uploads is a good example of the Na data exposing a bug).
There's a few options, but perhaps go with Librispeech: http://www.openslr.org/12/
If you have issues there I have some other utterance-aligned English speech data I can send you.
Would be good to have some wav audio recordings with associated transcriptions so we can test the frontend using some real data.