instructlab / community

InstructLab Community wide collaboration space including contributing, security, code of conduct, etc
Apache License 2.0
69 stars 40 forks source link

Capturing of data sources is known incomplete #170

Open lhawthorn opened 5 months ago

lhawthorn commented 5 months ago

See related issue 255 in the taxonomy repo.

Note that the issue is closed because we now have guidelines on acceptable data licenses for submissions to InstructLab, as well as rigorous requirements for attribution of data that are included in our pull request templates.

We are still aware that we not all data sources used thus far in the creation of InstructLab artifacts have been captured and documented, though we have done our level best to do so in our Data Sources documentation.

Other than documenting that this issue is known, I'm not sure what we can do about it for now.

lhawthorn commented 3 months ago

More information on the data used to train Granite models in this paper

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.