Custom Dataset Integration in Unified Semi-supervised Learning Benchmark
Motivation
The current iteration of the USB Unified Semi-supervised learning Benchmark is a valuable resource for researchers and practitioners in the field, providing benchmark datasets that help in comparing and evaluating different semi-supervised learning models effectively. However, the ability to incorporate custom datasets would significantly enhance its utility. Many users work with proprietary or niche datasets tailored to specific problems or industries. The strict focus on pre-defined benchmarks can be limiting, as it does not fully represent the diverse challenges encountered in real-world scenarios. By enabling the use of custom datasets, the USB could become not just a benchmarking tool but also a versatile platform for experimenting with and developing semi-supervised learning models across various domains.
Pitch
I propose extending the functionality of the USB framework to allow users to integrate their own datasets alongside the existing benchmarks. This feature should provide a standardized way to input data, define splits for training, validation, and testing, and ensure compatibility with the semi-supervised learning algorithms already implemented within the USB.
To achieve this, we might need:
A set of guidelines for dataset formatting and required metadata.
An API or interface for uploading and validating user-provided datasets.
Modifications to the core USB codebase to handle dynamic dataset integration without disrupting the benchmarking capabilities.
Alternatives
An alternative solution might involve creating separate branches or forks of the USB specifically for custom dataset experimentation. While this could provide a workaround, it would not be as seamless or user-friendly as having native support for custom datasets within the main USB platform.
Additional context
Incorporating this feature could increase the adoption of the USB by making it more relevant to a wider range of users. It could also foster a community where sharing and collaboration on various semi-supervised learning problems are encouraged, enhancing the collective knowledge base and potentially leading to advancements in the field.
🚀 Feature
Custom Dataset Integration in Unified Semi-supervised Learning Benchmark
Motivation
The current iteration of the USB Unified Semi-supervised learning Benchmark is a valuable resource for researchers and practitioners in the field, providing benchmark datasets that help in comparing and evaluating different semi-supervised learning models effectively. However, the ability to incorporate custom datasets would significantly enhance its utility. Many users work with proprietary or niche datasets tailored to specific problems or industries. The strict focus on pre-defined benchmarks can be limiting, as it does not fully represent the diverse challenges encountered in real-world scenarios. By enabling the use of custom datasets, the USB could become not just a benchmarking tool but also a versatile platform for experimenting with and developing semi-supervised learning models across various domains.
Pitch
I propose extending the functionality of the USB framework to allow users to integrate their own datasets alongside the existing benchmarks. This feature should provide a standardized way to input data, define splits for training, validation, and testing, and ensure compatibility with the semi-supervised learning algorithms already implemented within the USB.
To achieve this, we might need:
Alternatives
An alternative solution might involve creating separate branches or forks of the USB specifically for custom dataset experimentation. While this could provide a workaround, it would not be as seamless or user-friendly as having native support for custom datasets within the main USB platform.
Additional context
Incorporating this feature could increase the adoption of the USB by making it more relevant to a wider range of users. It could also foster a community where sharing and collaboration on various semi-supervised learning problems are encouraged, enhancing the collective knowledge base and potentially leading to advancements in the field.