This PR introduces changes to the way run- and qrel-files are loaded from disk/memory (#37). The goal is to (1) have forced typing and column name validation with default arguments; and (2) be able to circumvent both automatic checks if needed with custom arguments.
Flexible column naming
Tl;dr: read methods now all have a header argument that allows to specify custom column names. load methods have a mapping argument to optionally map external columns to the internal representation.
replace qrel_header argument in TrecQrel.read_qrel with immutable default
add run_header argument in TrecRun.read_run to mirror the TrecQrel.read_qrel signature
add column_mapping argument to TrecRun.load_run_from_dataframe to optionally map columns to internal representation correctly
Forced Typing & Validation
Tl;dr: Columns with default names have forced types. This can be circumvented by using the header/mapping argument during data loading, as custom column names are not subject to typechecking.
add forced string typing to query column in TrecQrel
add forced string typing to query, q0, and docid column in TrecRun
add validate argument to TrecRun.load_run_from_dataframe to trigger column name validation (default True)
column validation in TrecRun.load_run_from_dataframe now raises a ValueError instead of printing, and produces the full list of missing columns
Unittests
Tl;dr: unittest adaptation
adapt testtrecrun.py unittests to forced string type
adapt testtreceval.py unittests to forced string type
remove print statement from testtreceval.test_get_precision
This PR introduces changes to the way run- and qrel-files are loaded from disk/memory (#37). The goal is to (1) have forced typing and column name validation with default arguments; and (2) be able to circumvent both automatic checks if needed with custom arguments.
Flexible column naming
Tl;dr:
read
methods now all have aheader
argument that allows to specify custom column names.load
methods have amapping
argument to optionally map external columns to the internal representation.qrel_header
argument inTrecQrel.read_qrel
with immutable defaultrun_header
argument inTrecRun.read_run
to mirror theTrecQrel.read_qrel
signaturecolumn_mapping
argument toTrecRun.load_run_from_dataframe
to optionally map columns to internal representation correctlyForced Typing & Validation
Tl;dr: Columns with default names have forced types. This can be circumvented by using the
header
/mapping
argument during data loading, as custom column names are not subject to typechecking.query
column inTrecQrel
query
,q0
, anddocid
column inTrecRun
validate
argument toTrecRun.load_run_from_dataframe
to trigger column name validation (default True)TrecRun.load_run_from_dataframe
now raises a ValueError instead of printing, and produces the full list of missing columnsUnittests
Tl;dr: unittest adaptation
testtrecrun.py
unittests to forced string typetesttreceval.py
unittests to forced string typetesttreceval.test_get_precision