tira-io / tira

The source code for the TIRA Shared Task Platform
https://www.tira.io
MIT License
14 stars 9 forks source link

Move feature transformer to pyterrier_util #639

Closed gijshendriksen closed 4 months ago

gijshendriksen commented 4 months ago

This PR reduces code duplication between tira.pt.doc_features() and tira.pt.query_features(), and makes it easier to add new query-only, doc-only or query-document features.

The new TiraApplyFeatureTransformer accepts a mapping parameter that assigns input rows to a feature or feature vector. For instance, if we have a mapping with mapping['q0']['doc0'] = np.array([.1, .2]), we can instantiate the transformer as TiraApplyFeatureTransformer(mapping, ('qid', 'docno')). For one row, it will then create the features by selecting mapping[row['qid']][row['docno']].

@mam10eks if that's alright with you, I will expand the doc_features and query_features unit tests later, after the LongEval deadline. Do I also need to update or add documentation somewhere?

TheMrSheldon commented 4 months ago

Thank you very much for the PR! Could you add some short documentation and type hinting to TiraApplyFeatureTransformer? I don't find it immediately obvious what it does or why it is equivalent to the code it replaces.

gijshendriksen commented 4 months ago

Hi @TheMrSheldon, thank you for the quick feedback! I have added some documentation to the TiraApplyFeatureTransformer class, which hopefully explains what it's supposed to achieve. If this is still unclear, or the _get_features_from_row method requires additional docstrings, please let me know!

gijshendriksen commented 4 months ago

Thank you for the quick review!