Description

Doc or usr representations are computed separately and the interaction features are calculated based on the individual doc/usr fields. Eg., if we have 1 query, n usr fields and m doc fields , the the size of cosine similarity features is (1+n)*m.

This PR adds the support for projecting the usr embeddings and doc embeddings into 1 usr embeddings and 1 doc embeddings. The resulting cosine similarity features have size (1+1)*1=2 for the previous example. This could potentially reduce the resource needed for pre-computation/similarity score calculation during online serving--we only need to store 1 usr embedding and 1 doc embedding instead of (n+m) embeddings.

Fixes # (issue) N/A

Type of change

Please delete options that are not relevant.

[x] New feature (non-breaking change which adds functionality)

List all changes

Please list all changes in the commit.

added two flags for controlling whether to do usr/doc projections. Default is false.
in rep_model before similarity score computation, added the option for projecting the usr/doc fields with a dense layer. The size of the layer equals the individual field representation dim (ftr_size).

Testing

added unit test for the projection support
pytest
flake8
tested a full run with detext/resources/run_detext.sh

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules

linkedin / detext

Doc/usr projection #11

Description

Type of change

List all changes

Testing

Checklist