Closed snova-darshang closed 3 months ago
Thank you so much for your feedback! We will Add more documentation and explanations around recommended input data.
This Issue is stale because it has been open for 6 months with no activity. Remove stale label or comment or this issue will be closed in 30 days.
Documentation has been updated in PR https://github.com/sambanova/generative_data_prep/pull/90
suggested methods to cleanup raw data/text or pointers for common practices is out of scope of this repository
https://github.com/sambanova/generative_data_prep#input-format
the input format instructions can be improved upon. A. can we we provide examples for what is format for general ML usecases such as 1. pretrainined, 2.finetuning and 3.inference. B. are there suggested methods to cleanup raw data/text or pointers for common practices. that can be highlighted? C. Can we explain restrictions on contents of "prompt" and "completion"? such as maximum or minimum input length, what it should/shouldnot contain.