Transcription POC - RFC

ChakshuGautam commented 2 weeks ago

TODO

[ ] Capture and Transcribe
- [ ] View that allows you to transcribe audio similar to this
- [ ] Store the Audio (wav) => Minio Bucket @PiyushRaj927 to share the bucket credentials and make them available to the service
- [ ] Store the Transcripts (txt) => Minio Bucket
- [ ] Storing the fixed Transcripts (txt) => Minio Bucket
- [ ] Sessions and SessionsID to be tracked in a SQLite DB
[ ] Training Pipeline
- [ ] Upload the wav file to Huggingface Dataset
- [ ] Call the autotune API to get a new Dataset
- [ ] Add Whisper training to Autotune
- [ ] Call Training API from autotune
[ ] @PiyushRaj927
- [ ] Provision Infra
- [ ] Deploy Autotune
- [ ] Deploy Transcription POC

Technology

Based repo to be used - https://github.com/alesaccoia/VoiceStreamAI

Specifications

File Name - <sessionID>.<length>.<original/modified>.<wav/txt>
Session - sessionID, startTime, endTime

ChakshuGautam commented 2 weeks ago

https://github.com/suyashgautam to guide Sarvesh.

xorsuyash commented 2 weeks ago

cc @ChakshuGautam cc @GautamR-Samagra

Format to hit autotune api for force aligning audios in specific length

First create a workflow in autotune, and extract workflow id from it

   curl --location 'https://autotune.dev.bhasai.samagra.io/v2/workflow/create/' \
        --header 'User-Id: 0297f861-cc97-4b27-b464-ed826dbda7eb' \
        --header 'role: user' \
        --header 'Content-Type: application/json' \
        --data '{
            "config":{
                "config_name": "QnA",
                "system_prompt": "You are a helpful data generation assistant working as a teacher. You are an expert in this field. Don'\''t Hallucinate.",
                "user_prompt_template": "{{workflow.user_prompt}}",
                "temperature":1,
                "schema_example": {
                    "question": "4 + 5",
                    "answer":"9" 
                }
            },
            "workflow": {
                "workflow_name": "Data Analysis Workflow",
                "total_examples": 100,
                "split": [
                    80,
                    10,
                    10
                ],
                "user_prompt":"Generate questions to test addition and substraction for grade 1 students. Your task is to generate 5 addition questions and 5 subtraction questions with single digits numbers.",
                "llm_model": "gpt-3.5-turbo-0125",
                "tags": [
                    "data analysis",
                    "machine learning"
                ]
            }
        }'

For hitting force alignment end point, dataset which we want to align must be on hugging face and format of dataset must be ----
```
         - audio_1.wav 
         - audio_2.wav
         - .
         - .
         - transcription.txt 
```

format of of transcription.txt must be like this Screenshot from 2024-06-14 13-38-07 audio name and space separated transcript

After creating workflow you we can force align using

             curl -X POST   -H "Content-Type: application/json"   -d '{"dataset":"xorsuyash/asr_datasetp2","workflow_id":"b23fe059-e941-4045-ad6c-bf9330e88455","save_path":"SamagraDataGov/asr_dataset_test_p9","transcript_available":"true","time_duration":5.0}'   https://autotune.dev.bhasai.samagra.io/v1/workflow/force-align

Here

dataset: path of hugging face dataset which we want to force align .
workflow_id: workflow id that we created earlier
save_path: hugging face repo name where we want to save aligned audios.
transcript_available: right now it only supports if transcription is available later support will be extended if transcript is not available.
time_duration: length of audio duratrion in which we want to align our dataset.

pucardotorg / dristi_experiments

Transcription POC - RFC #7

TODO

Technology

Specifications