Get english language data for testing

persephone-tools / persephone-frontend

A web frontend for the Persephone ASR tool

GNU Affero General Public License v3.0

4 stars 1 forks source link

Get english language data for testing #4

Open shuttle1987 opened 5 years ago

shuttle1987 commented 5 years ago

Would be good to have some wav audio recordings with associated transcriptions so we can test the frontend using some real data.

oadams commented 5 years ago

Is this issue mostly about having English data you understand? I'd say testing with the same Na data that has been tested on Persephone seems like the way to go, since we know what results to expect.

(I'll add that I assumed the Web-API had been tested with all the Na data, but unless I'm now mistaken it looks like I'm wrong to have thought that. This should be high priority).

shuttle1987 commented 5 years ago

The main reason for wanting English data is that it is hard to do UX testing for a language you don't understand. The Na data set is good for making sure that functionality works and matches the expected behavior (the issue about uploads is a good example of the Na data exposing a bug).

oadams commented 5 years ago

There's a few options, but perhaps go with Librispeech: http://www.openslr.org/12/

If you have issues there I have some other utterance-aligned English speech data I can send you.