British English TTS model

This seems to me slightly more an HF code question than a SpeechT5 issue but I think I can help. so dropping a few pointers below 🙂

Not sure if you noticed, but on the link you provided it refers to the speaker embeddings / xvectors, and the code uses a particular example from the Matthijs/cmu-arctic-xvectors dataset. That gives the general quality of the output speech.

In the example code, it's using embeddings_dataset[7306] but if you switch to another value you'll get other speakers in the dataset. There is a Scottish speaker (ie British) in there, I don't recall the Id range you need off hand (note: the Ids are not per speaker, I think they're per xvector/per recording and there are several from each speaker, so 7305 is the same speaker as 7306, although the quality / style can vary a little). Exploring the dataset on HF (as per the link above) will help you find suitable Ids a bit quicker as each record has details of the accent - for the Scottish speaker look for "awb" in the filename.

If you want other accents not in the embeddings dataset, you can search for other xvectors online and use those (as per the comment in the code:

# You can replace this embedding with your own as well.

YMMV but a bit of a Google should work for this and there's most likely software for extracting xvectors from audio samples.

microsoft / SpeechT5

British English TTS model #69