This repository is for extracting asset data, namely voice clips and their corresponding transcriptions, from the game Disco Elysium, specifically the "Final Cut" version, and reformat the extracted data into a format that ESPnet understands and could use to train a vocoder.
My goal is to at the very least 1) have a vocoder using WaveNet with the characteristics of the narrator in the "Final Cut" version of the game, and 2) package and publish the vocoder as a mobile app, as the open source ones I found so far are not really great.
To these ends, I intend to have three repositories:
I put the code for preparing the data into Mix tasks. You can check them under mix/tasks
to see the details.
I love the game and the voice of its narrator, and perhaps out of vanity I think I could do better than current open-source text-to-speech solutions available on mobile platforms.
This project is written with the Final Cut version of the game in mind, specifically version 2832f901
, released on 2021-04-19. I cannot ensure the correctness of the app for earlier or later versions, in fact I have tried using this repository on a later version and things no longer work. For now, to use this repository you will need to use a program to download version 2832f901
of the game, for example DepotDownloader.
Please also note that you will need around 65GB of free disk space to store the extracted audio clips.
So far I have only completed two mix
tasks doing the following:
I still need to implement two other mix
tasks doing the following:
csv
file following the LSJ format for training.csv
file needed for training.If you still want to check out the finished mix
tasks then please follow the instructions for setting up the repository and running those task in the sections below.
Should you wish to try out the code in this repo, please follow the instructions in the sections below:
You should have these installed:
I cannot guarantee that the code works for lower versions of the applications listed above.
Please also make sure that you have a around 65GB of free disk space for the audio clips.
Create a database.exs
file under the folder config
of the repository. The content of the file should look like this:
import Config
config :data_prepration, Elysium.Repo,
database: "elysium",
username: "<Your Database Username Here>",
password: "<Your Database Password Here>",
hostname: "localhost",
log: :info # Change this to false to mute ecto debug logs. Keep it otherwise.
Then run mix deps.get
to install dependencies of the project. Note that the file database.exs
is necessary for setting up the database as well.
Make sure that you have created a user within PostgreSQL using the credentials in the file database.exs
. Then run these commands to setup the database:
mix ecto.create
mix ecto.migrate
mix
tasks to prepare the extracted data for trainingYou will need to use Asset Studio to extract data from the asset files. Please purchase a copy of the game. I can give you a copy of the extracted data and the generated database as well if you cannot buy the game for some reason.
<game root>/disco_Data/StreamingAssets/aa/StandaloneWindows64/dialoguebundle_assets_all_<some hash>.bundle
.You should see the folder MonoBehaviour
within the location you chose in step #4.
<game root>/disco_Data/StreamingAssets/aa/StandaloneWindows64/
.AudioClip
is checked.AudioClip
within the folder you chose that contains all of the audio clips.Run this command:
mix prepare_bundle <path to the dialogue bundle json file>
For example:
mix prepare_bundle '/extracted_assets/MonoBehaviour/Disco Elysium.json'
After running this task, you should see that the database configured in the file database.exs
is populated with conversation, dialogue entry, actor, and item data.
Run this command:
mix label_audio_clips <path to the folder containing the audio clips>
For example:
mix prepare_bundle '/extracted_assets/AudioClip'
After running this task, you should see the configured database is populated with audio clip metadata, in the table audio_clips
.
If you are interested in contributing or reporting bugs, please check the issue list. Constructive feedback is appreciated.