Doc or usr representations are computed separately and the interaction features are calculated based on the individual doc/usr fields. Eg., if we have 1 query, n usr fields and m doc fields , the the size of cosine similarity features is (1+n)*m.
This PR adds the support for projecting the usr embeddings and doc embeddings into 1 usr embeddings and 1 doc embeddings. The resulting cosine similarity features have size (1+1)*1=2 for the previous example. This could potentially reduce the resource needed for pre-computation/similarity score calculation during online serving--we only need to store 1 usr embedding and 1 doc embedding instead of (n+m) embeddings.
Fixes # (issue)
N/A
Type of change
Please delete options that are not relevant.
[x] New feature (non-breaking change which adds functionality)
List all changes
Please list all changes in the commit.
added two flags for controlling whether to do usr/doc projections. Default is false.
in rep_model before similarity score computation, added the option for projecting the usr/doc fields with a dense layer. The size of the layer equals the individual field representation dim (ftr_size).
Testing
added unit test for the projection support
pytest
flake8
tested a full run with detext/resources/run_detext.sh
Test Configuration:
Firmware version:
Hardware:
Toolchain:
SDK:
Checklist
[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules
Description
Doc or usr representations are computed separately and the interaction features are calculated based on the individual doc/usr fields. Eg., if we have 1 query, n usr fields and m doc fields , the the size of cosine similarity features is (1+n)*m.
This PR adds the support for projecting the usr embeddings and doc embeddings into 1 usr embeddings and 1 doc embeddings. The resulting cosine similarity features have size (1+1)*1=2 for the previous example. This could potentially reduce the resource needed for pre-computation/similarity score calculation during online serving--we only need to store 1 usr embedding and 1 doc embedding instead of (n+m) embeddings.
Fixes # (issue) N/A
Type of change
Please delete options that are not relevant.
List all changes
Please list all changes in the commit.
Testing
Test Configuration:
Checklist