Open lhawthorn opened 5 months ago
More information on the data used to train Granite models in this paper
This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.
See related issue 255 in the taxonomy repo.
Note that the issue is closed because we now have guidelines on acceptable data licenses for submissions to InstructLab, as well as rigorous requirements for attribution of data that are included in our pull request templates.
We are still aware that we not all data sources used thus far in the creation of InstructLab artifacts have been captured and documented, though we have done our level best to do so in our Data Sources documentation.
Other than documenting that this issue is known, I'm not sure what we can do about it for now.