Develop guidance on sharing training data, including:
build on existing SPD-41a FAQ on this topic
Training data is in scope of SPD-41a, especially if needed to validate the results of a scientific finding (e.g., would need training data used to reproduce findings resulting from AI/ML models).
In general, you should provide all training data. However, there are considerations (list examples) for why it would not be appropriate to share complete training data. In that case, what can you share? Work with Manil on this.
Recommendations on how/where to share training data (repository selection), and considerations based on size of training dataset
commercial data used for training and implications for sharing
Examples of how training data are being shared openly - ESDS ACCESS projects - work with Cerese and Manil on this.
Develop guidance on sharing training data, including: