neulab / explainaboard_web

MIT License
8 stars 2 forks source link

refactor benchmark backend code #540

Closed qjiang002 closed 1 year ago

qjiang002 commented 1 year ago

This PR aims to fix issue #533

Previously, benchmark_db_utils needs to load SystemInfo which depends on SDK versioning. However, since current benchmark only needs System properties such as overall results for the benchmark plots or tables, we can refactor the code to avoid computing and storing additional SystemInfo in cache which will expire due to SDK upgrading.

qjiang002 commented 1 year ago

Hi @neubig , I got this error when loading this benchmark:

[0]   File "/Users/jiangqi/Desktop/Capstone/explainaboard_web/backend/src/gen/explainaboard_web/impl/db_utils/benchmark_db_utils.py", line 187, in <setcomp>
[0]     (x.dataset.dataset_name, x.dataset.sub_dataset, x.dataset.split)
[0] AttributeError: 'NoneType' object has no attribute 'dataset_name'

This is because this benchmark try to find all ner systems with 'system_query': {'task_name': 'named-entity-recognition'}, but there are systems with undefined/custom dataset, so their dataset is None.

One way to deal with undefined datasets is to ignore undefined datasets in benchmark. I think we cannot merge systems with undefined datasets because they may be for different tasks and have different metrics. WDYT?

qjiang002 commented 1 year ago

Hi @neubig , I got this error when loading this benchmark:

[0]   File "/Users/jiangqi/Desktop/Capstone/explainaboard_web/backend/src/gen/explainaboard_web/impl/db_utils/benchmark_db_utils.py", line 187, in <setcomp>
[0]     (x.dataset.dataset_name, x.dataset.sub_dataset, x.dataset.split)
[0] AttributeError: 'NoneType' object has no attribute 'dataset_name'

This is because this benchmark try to find all ner systems with 'system_query': {'task_name': 'named-entity-recognition'}, but there are systems with undefined/custom dataset, so their dataset is None.

One way to deal with undefined datasets is to ignore undefined datasets in benchmark. I think we cannot merge systems with undefined datasets because they may be for different tasks and have different metrics. WDYT?

This is another issue not related to this refactor PR. I'll merge this PR and record this problem in another issue.