[MiniLLM]Why dolly only has 12435 training samples?

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.6k stars 274 forks source link

[MiniLLM]Why dolly only has 12435 training samples? #168

Closed yumath closed 7 months ago

yumath commented 7 months ago

but in your paper, Section 3.1

Training We construct the training data from databricks-dolly-15k consisting of 15K human- written instruction-response pairs. We randomly split 14K samples as the training set D and left 500 samples for validation and testing, respectively.

t1101675 commented 7 months ago

See #167

yumath commented 7 months ago

Thx very much