okulovsky / kaia

GNU General Public License v3.0
17 stars 3 forks source link

Narration-related ideas #6

Open okulovsky opened 4 days ago

okulovsky commented 4 days ago

Generally, it's not a low-hanging fruit to fine-tune language model yet. Better/cheaper techniques are needed.

Creative articulator (CA) project allows synthesizing summary-to-original-text datasets, so one can train network from short plan of the text to the full text. It also contains a basic container that runs the training, like in CoquiTTS

Might be interesting to train network on anime/movies dialogues to better capture the genre. whisper-x allows a diarization.

CA project also contains a pilot research to predict speech modality based on dialogue. Maybe gestures/intonations can be predicted, so in the free conversation the image would naturally react to the conversation's course. Maybe along with the diarization, emotions and gestures can be extracted from the video as well.

okulovsky commented 4 days ago

https://github.com/OpenAccess-AI-Collective/axolotl supposedly has an already depeloped container for LLM training