compare_json will drop out results with negative scores.
Negative scores are quite common under criterion RT.
When all scores for a file are negative for all configs where all files have been scored, compare_json --single_config will fail.
There needs to be a tool for making sure there exists a config with at least one positive score for each file in the test set - one can do it manually by picking a likely config and running force_run_config, but this should be automatable.
compare_json will drop out results with negative scores. Negative scores are quite common under criterion RT. When all scores for a file are negative for all configs where all files have been scored, compare_json --single_config will fail.
There needs to be a tool for making sure there exists a config with at least one positive score for each file in the test set - one can do it manually by picking a likely config and running force_run_config, but this should be automatable.