Long-term roadmap - Githubissues

Feedback gathered from user reviews of the current OpenML website. I'm posting it here to keep track, these can be prioritized and split off into new issues at the appropriate time.

[ ] It's unclear for many users how to use OpenML. Make a better 'getting started' page and/or video tutorials.
[ ] OpenML feels unstructured. It lacks high level overview of dataset etc.
[ ] Difficult to filter on image/vision datasets.
[ ] Datasets are all public. No way yet to share datasets with a limited group of people
[ ] Local installation is hard.
[ ] Limited instructions on how to deploy in organizations. How to upgrade?
[ ] AutoML 'bots' do not run automatically on new datasets. Many tasks have no results.
[ ] Donations should be possible (in a non-intrusive way)
[ ] Lack of integration with other dataset repositories (e.g. Zenodo, Kaggle, data.world, UCI, open government, OpenNeuro, BrainLife.io,...)
[ ] Trustworthiness of results: how do we know if results are not overfitted/biased?
[ ] Data/algorithm management: how to correct issues?
[ ] Handling of large datasets. Need efficient downloading and loading of large datasets.
[ ] Handling of evolving datasets. E.g. where new data is added every day.
[ ] Automated tools for checking data quality, fixing issues
[ ] Collaboration: Need chat/communication/forum to support user discussions/help. Only some users find their way to GitHub issues, and those discussions are hard to find (e.g. https://github.com/openml/OpenML/issues/455)
[ ] Collaboration: Streamline interactive model building. Allow people to see and understand each other's models, download and improve them, discuss online,... Maybe have one user/group-assigned 'leading model' per task (not necessarily the one with highest accuracy).
[ ] Unclear documentation on the security of uploaded data. How to collaborate on sensitive datasets (e.g. medical)?

openml / openml.org

Long-term roadmap #130