kubeagi / arcadia

A diverse, simple, and secure one-stop LLMOps platform
http://www.kubeagi.com/
Apache License 2.0
78 stars 23 forks source link

Add a CRD to convert audio/pdf to text #766

Closed nkwangleiGIT closed 6 months ago

nkwangleiGIT commented 6 months ago

Allow to define a CR and configure the input audio/pdf files (pdf, wav, mp3, etc ...), then the controller can transcribe pdf/audio to text file. Then we can use the text for QA or summary.

Handle audio file:

apiVersion: converter.kubeagi.k8s.com.cn/v1alpha1
kind: Audio2Text
metadata:
  name: auto2text-converter1
  namespace: kubeagi-test
spec:
  files:
  - <audio file from MinIO>
status:
  condition: Processing
  result: <text from audio>

Handle pdf file:

apiVersion: converter.kubeagi.k8s.com.cn/v1alpha1
kind: Pdf2Text
metadata:
  name: pdf2text-converter1
  namespace: kubeagi-test
spec:
  files:
  - <pdf file from MinIO>
status:
  condition: Processing
  result: <text from pdf>

The converter can invoke the APIs or start a long-running job to handle the data. For other data processing, we can also follow the similar approach, and use these data process CRs to orchestrate LLM agent.

bjwswang commented 6 months ago

LGTM.

Allow to define a CR and configure the input audio/pdf files (pdf, wav, mp3, etc ...), then the controller can transcribe pdf/audio to text file. Then we can use the text for QA or summary.

Handle audio file:

apiVersion: converter.kubeagi.k8s.com.cn/v1alpha1
kind: Audio2Text
metadata:
  name: auto2text-converter1
  namespace: kubeagi-test
spec:
  files:
  - <audio file from MinIO>
status:
  condition: Processing
  result: <text from audio>

Handle pdf file:

apiVersion: converter.kubeagi.k8s.com.cn/v1alpha1
kind: Pdf2Text
metadata:
  name: pdf2text-converter1
  namespace: kubeagi-test
spec:
  files:
  - <pdf file from MinIO>
status:
  condition: Processing
  result: <text from pdf>

The converter can invoke the APIs or start a long-running job to handle the data. For other data processing, we can also follow the similar approach, and use these data process CRs to orchestrate LLM agent.

LGTM! Similar to kubeagi evaluations,we can use k8s jobs as the converter. Utilize core-library to devlop the converter cli.

bjwswang commented 6 months ago

We can use this CLI(deveoped with python) as the job runner

cli convert xxx

https://github.com/kubeagi/core-library/issues/10

nkwangleiGIT commented 6 months ago

Add a CRD for now to handle document loader: https://github.com/kubeagi/arcadia/pull/771 We can change the logic(use different document loader implementation) to use Job(cli) as needed later

nkwangleiGIT commented 6 months ago

Mark it done as we can support it using documentloader in chat mode.