harmonydata / app

Harmony front end
https://harmonydata.ac.uk/app/
MIT License
0 stars 4 forks source link

Evaluate and add Tensorflow JS sentence embedding model #15

Open woodthom2 opened 7 months ago

woodthom2 commented 7 months ago

Description

The user should be able to request the TensorFlow sentence embedding model. But first we should evaluate it (https://github.com/harmonydata/matching).

This has the huge advantage that the embedding model runs on client side so we don't use up server resources.

Linked to: https://github.com/harmonydata/app/issues/14

Rationale

We have had requests to improve the model's matching and add options for more LLMs. TFJS should be easy to add because it's client side. But first we need to evaluate it!

Code snippet

Here is an example HTML file using the https://github.com/tensorflow/tfjs-models Universal Sentence Encoder


<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder"></script>
</head>

<body>

<script>
// Load the model.
use.loadQnA().then(model => {
  // Embed a dictionary of a query and responses. The input to the embed method
  // needs to be in following format:
  // {
  //   queries: string[];
  //   responses: Response[];
  // }
  // queries is an array of question strings
  // responses is an array of following structure:
  // {
  //   response: string;
  //   context?: string;
  // }
  // context is optional, it provides the context string of the answer.

  const input = {
    queries: ['I feel depressed'],
    responses: [
      'I feel sad',
      'I feel happy',
    ]
  };
  var scores = [];
  const embeddings = model.embed(input);
  /*
    * The output of the embed method is an object with two keys:
    * {
    *   queryEmbedding: tf.Tensor;
    *   responseEmbedding: tf.Tensor;
    * }
    * queryEmbedding is a tensor containing embeddings for all queries.
    * responseEmbedding is a tensor containing embeddings for all answers.
    * You can call `arraySync()` to retrieve the values of the tensor.
    * In this example, embed_query[0] is the embedding for the query
    * 'How are you feeling today?'
    * And embed_responses[0] is the embedding for the answer
    * 'I\'m not feeling very well.'
    */
  scores = tf.matMul(embeddings['queryEmbedding'],
      embeddings['responseEmbedding'], false, true).dataSync();
  console.log(scores);
});

</script>

</body>
</html>