linagora / james-project

Mirror of Apache James Project
Apache License 2.0
70 stars 63 forks source link

[SEARCH] Implement JMAP SearchSnipset #5226

Open chibenwa opened 1 month ago

chibenwa commented 1 month ago

Why?

Provide highlight for search.

How?

Arsnael commented 1 month ago

It's just in the end adding a highlight param to opensearch in the search operated by jmap email/query though correct?

Not sure to understand why we would need an all new jmap method for that, can't we just extend email/query with an additional request argument highlight? Allowed in presence of an extra custom jmap capability of course, like urn:apache:james:params:jmap:mail:highlights ? We do similar thing (extending ietf defined jmap method) with urn:apache:james:params:jmap:mail:shares for shared mailboxes for example (Mailbox/get, Email/get, Email/set, Email/query, ...)

That would be easier for the back and for the front in terms of dev and adoption.

Something like:

{
  "using": [
    "urn:ietf:params:jmap:core",
    "urn:ietf:params:jmap:mail",
    "urn:apache:james:params:jmap:mail:shares",
    "urn:apache:james:params:jmap:mail:highlights"
  ],
  "methodCalls": [
    [
      "Email/query",
      {
        "accountId": "1234",
        "filter": {
          "operator": "AND",
          "conditions": [
            {
              "inMailbox": "5678",
              "text": "Twake"
            }
          ]
        },
        "sort": [
          {
            "isAscending": false,
            "property": "receivedAt"
          }
        ],
        "limit": 20,
        "highlight": true
      },
      "c0"
    ]
  ]
}

WDYT?

vttranlina commented 1 month ago

I tend to agree with Rene, We don't need a new dedicated JMAP method for it The "highlight" is make sense when using "JMAP/Query" (search) + "JMAP/get" method the 2 properties will be cared: preview, subject (of Email/get response)


Update: with this way, will we return "hightlight response" in directly preview, subject property, or in another property? e.g: preview_hightlight, subject_hightlight

chibenwa commented 1 month ago

It's just in the end adding a highlight param to opensearch in the search operated by jmap email/query though correct?

Agreed OpenSearch allows to do this within one query.

But JMAP don't (!)

With JMAP you need to do a Email/query first that only returns ids then a second call to SearchSnipset/get.

I agree an extension could work around this, though :

Arsnael commented 1 month ago

Alright

Arsnael commented 1 month ago

Btw I think the term snipset is not really a word? Did you mean snippet? What about SearchHighlight/get more simply?

For the get method, I guess you needs the lists of ids as parameter and the parameters of the email/query method for the search query (that you used already in email/query) correct? What would be highlighted is the term in text condition field. Maybe sth like:

{
  "using": [
    "urn:ietf:params:jmap:core",
    "urn:ietf:params:jmap:mail",
    "urn:apache:james:params:jmap:mail:shares",
    "urn:apache:james:params:jmap:mail:highlights"
  ],
  "methodCalls": [
    [
      "SearchHighlight/get",
      {
        "accountId": "1234",
        "ids": ["1", "2"]
        "filter": {
          "operator": "AND",
          "conditions": [
            {
              "inMailbox": "5678",
              "text": "James"
            }
          ]
        },
        "sort": [
          {
            "isAscending": false,
            "property": "receivedAt"
          }
        ],
        "limit": 20
      },
      "c0"
    ]
  ]
}

=>

{
    "sessionState": "2c9f1b12-b35a-43e6-9af2-0106fb53a943",
    "methodResponses": [
        [
            "SearchHighlight/get",
            {
                "accountId": "1234",
                "state": "bc5892f0-44dc-11ef-a70d-cbbc207636f6",
                "list": [
                    {
                        "id": "1",
                        "blobId": "123",
                        "keywords": {
                            "$seen": true
                        },
                        "mailboxIds": {
                            "5678": true
                        },
                        "size": 3286,
                        "receivedAt": "2024-06-11T08:02:21Z",
                        "to": [
                            {
                                "email": "bob@james.com"
                            }
                        ],
                        "from": [
                            {
                                "name": "James",
                                "email": "noreply@james.com"
                            }
                        ],
                        "subject": "Hello bob",
                        "sentAt": "2024-06-11T08:02:15Z",
                        "hasAttachment": false,
                        "preview": "Hey welcome to <em>James</em>. Hope you enjoy using <em>James</em>"
                    },
                    {
                        "id": "2",
                        "blobId": "789",
                        "keywords": {
                            "$seen": true
                        },
                        "mailboxIds": {
                            "5678": true
                        },
                        "size": 14666,
                        "receivedAt": "2024-07-12T09:06:31Z",
                        "to": [
                            {
                                "email": "bob@james.com"
                            },
                            {
                                "name": "Alice",
                                "email": "alice@james.com"
                            }
                        ],
                        "from": [
                            {
                                "name": "Cedric",
                                "email": "cedric@james.com"
                            }
                        ],
                        "subject": "You need to try this!!!",
                        "sentAt": "2024-07-12T09:06:28Z",
                        "hasAttachment": false,
                        "preview": "## Test I send you an email from <em>James</em>, please use it, ..."
                    }
                ],
                "notFound": []
            },
            "c0"
        ]
    ]
}

We cannot rely on the precomputed previews then too

chibenwa commented 1 month ago

https://jmap.io/spec-mail.html#search-snippets

image

Let's stick to the spec?

Arsnael commented 1 month ago

Fair enough, I missed that, even easier

vttranlina commented 1 month ago

Q: Will the SearchSnippet/get method reuse the result from Email/get(1) or query OpenSearch directly? (2)

// I am considering (1) because we can manually detect highlighted text from the EmailGet body.

Arsnael commented 1 month ago

Opensearch directly.

You don't use Email/get btw in this scenario

You would do => email/query then searchSnippet/get (with the ids you get from email/query response)

Extra trip to opensearch but you don't do a search this time on all documents, just the documents you passed the ids from.

The RFC seems to be written this way as well.

Arsnael commented 1 week ago

Task list:

chibenwa commented 1 week ago

Ok this task list focusses on presentation layer.

How about the internal engine for search snippets for the various search implems? Opensearch, lucene and memory?

Arsnael commented 1 week ago

How about the internal engine for search snippets for the various search implems? Opensearch, lucene and memory?

Can argue that Lucene and memory are missing, but for opensearch that's the first task of the list?

chibenwa commented 1 week ago

Can argue that Lucene and memory are missing

Lucene: https://lucene.apache.org/core/8_0_0/highlighter/org/apache/lucene/search/highlight/Highlighter.html

If we seriously plan to use Lucene as a JMAP search backend we rather put a little effort on the topic IMO (at least eventually)

Regarding memory, I am piece-mealed:

Arsnael commented 1 week ago

I didn't say we should not do memory and Lucene implems, just that they were missing from the task list (but not opensearch) and then yes should add them (sorry if was not clear)

EDIT: task list updated

quantranhong1999 commented 1 week ago

Highlights are to be made on the body.

Should we support subject too? JMAP RFC does mention about subject and preview.

vttranlina commented 1 week ago

Highlights are to be made on the body.

Should we support subject too? JMAP RFC does mention about subject and preview.

yep I think even we don't need hightlight body. subject + preview is enough image

// AH, the term "body" may be a typo, as it does not correspond to the exact email body keyword in the Email property.

vttranlina commented 6 days ago

Task list:

  • Integration tests

is it necessary?

I think the task SearchSnippet/get method is include it, (SearchSnippetGetMethodContract in server/protocols/jmap-rfc-8621-integration-tests/jmap-rfc-8621-integration-tests-common/src )

vttranlina commented 6 days ago

Locate part of their text (exact match) and show 100 chars before and 100 chars after. Ideally find a library for that.

The OpenSearch already support it It is option: "fragment" (fragment_size, fragment_offset, number_of_fragments...)

// I'm searching the library for memory implementations.

quantranhong1999 commented 6 days ago

// I'm searching the library for memory implementations.

FYI Lucene has an option to search in memory (not persist in the file system). But I am not sure if turning memory to use Lucene is a good idea.

chibenwa commented 6 days ago

What do you think about using lucene-highlighter for it?

priview for searchsnipet is not preview in the sense of email/get but rather the preview of the relevant part of the mail

But I am not sure if turning memory to use Lucene is a good idea.

Why not?

We could keep Scanning... around but switch memory-app to a memory-based-lucene without protests: it's testing based after all!