Closed EichiUehara closed 2 days ago
We can use text dataset which can be convertible to question and answer format.
Define script to download dataset for training
https://huggingface.co/datasets/lavita/ChatDoctor-iCliniq https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k
https://www.cdc.gov/floods/about/index.html https://www.who.int/
https://www.healthline.com/ https://www.npr.org/sections/health/ https://www.ama-assn.org/
https://github.com/Zjh-819/LLMDataHub
dataset/rag
https://github.com/redhat-intel-ai-hackathon-raft-rag/monorepo/commit/d9bc83c3715cc947bd5f2338fc6ea6c94580d531 https://github.com/redhat-intel-ai-hackathon-raft-rag/monorepo/commit/616452ab09544925b164f7ff91c80c60b4b71312
Hey can you explain more clearly , is it just the python query to download the dataset
Your task is coming up how to do the task to meet the requirement.
requirement
We can use text dataset which can be convertible to question and answer format.
task
Define script to download dataset for training
medical question and answer dataset
https://huggingface.co/datasets/lavita/ChatDoctor-iCliniq https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k
medical announcement
https://www.cdc.gov/floods/about/index.html https://www.who.int/
medical news
https://www.healthline.com/ https://www.npr.org/sections/health/ https://www.ama-assn.org/
more general LLM dataset for fine tuning
https://github.com/Zjh-819/LLMDataHub
path
dataset/rag
related issue
2
commits
https://github.com/redhat-intel-ai-hackathon-raft-rag/monorepo/commit/d9bc83c3715cc947bd5f2338fc6ea6c94580d531 https://github.com/redhat-intel-ai-hackathon-raft-rag/monorepo/commit/616452ab09544925b164f7ff91c80c60b4b71312