sambanova / ai-starter-kit

Other
81 stars 29 forks source link

Codym/llava data prep - Ingestion pipeline to download and prepare DocVQA #287

Closed snova-codym closed 2 months ago

snova-codym commented 2 months ago

Original huggingface dataset is here:

https://huggingface.co/datasets/HuggingFaceM4/DocumentVQA

This pipeline downloads and prepares the dataset for upload to SambaStudio.

snova-codym commented 2 months ago

I accidentally added a modified notebook from the multimodal knowledge retrieval kit. Please take the original to resolve the conflict.

snova-codym commented 2 months ago

I resolved the conflict. Was a minor change.