Describe the problem
Currently, adding new query or task types to the code requires manual updates in multiple places. For example, the following code snippet illustrates that the query_type field must be updated whenever a new type is introduced:
query_type: Optional[Literal["Fact Verification", "Text to SQL", "Table Question Answering", "Needle in Haystack", "Other"]] = Field(
default=None, description="Type of query in this dataset."
)
Maintaining this approach can become cumbersome and error-prone, especially as more task types are added. It risks inconsistency and increases the chance of bugs due to missed updates.
Expected behavior
The code should be structured in a way that allows for automatic updates or centralized management of query types, reducing the need for manual changes.
Proposed solutions
Enum Class for Centralized Management with Literal Strings: Use the existing QueryType enum class located in target_benchmark/dataset_loaders/DatasetLoaderEnums.py to represent task types and use it to dynamically generate Literal strings for type hints. This allows for centralized management of query types.
class QueryType(str, Enum):
FACT_VERIFICATION = "Fact Verification"
TEXT_TO_SQL = "Text to SQL"
TABLE_QA = "Table Question Answering"
NEEDLE_IN_HAYSTACK = "Needle in Haystack"
OTHER = "Other"
# Generate Literal type from Enum values
QueryTypeLiteral = Literal[*(query.value for query in QueryType)]
class ExampleModel(BaseModel):
query_type: Optional[QueryTypeLiteral] = Field(
default=None, description="Type of query in this dataset."
)
This approach ensures that any new task type only requires updating the Enum class, which can be imported and used to automatically update type hints, reducing redundancy and potential for missed updates.
additionally, more Literals are defined across files.
for example for splits, and for output formats. all these literals might benefit from being centrally constructed.
Describe the problem Currently, adding new query or task types to the code requires manual updates in multiple places. For example, the following code snippet illustrates that the
query_type
field must be updated whenever a new type is introduced:Maintaining this approach can become cumbersome and error-prone, especially as more task types are added. It risks inconsistency and increases the chance of bugs due to missed updates.
Expected behavior The code should be structured in a way that allows for automatic updates or centralized management of query types, reducing the need for manual changes.
Proposed solutions
Enum Class for Centralized Management with Literal Strings: Use the existing
QueryType
enum class located intarget_benchmark/dataset_loaders/DatasetLoaderEnums.py
to represent task types and use it to dynamically generateLiteral
strings for type hints. This allows for centralized management of query types.This approach ensures that any new task type only requires updating the
Enum
class, which can be imported and used to automatically update type hints, reducing redundancy and potential for missed updates.