Implement freetext search in cht-datasource

jkuester commented 2 weeks ago

Spinning this off into its own ticket (from https://github.com/medic/cht-core/issues/9544) since it can stand alone.

What feature do you want to improve? As a part of updating our freetext search implementation, we can take the opportunity to implement the search logic in cht-datasource. This will provide a more clean interface for cht-code code to perform searches of contacts and reports as well as resulting in the creation of new REST endpoints for the same searching.

Describe the improvement you'd like

shared-libs/cht-datasource

qualifier.ts


/**
* Combines multiple qualifiers into a single object.
 * @returns the combined qualifier
 * @throws Error if any of the qualifiers contain intersecting property names
*/
const and = <A, B, C, D>(qualA: A,  qualB: B,  qualC?: C,  qualD?: D): =>  A & B & (C | object) & (D | object);

/**
 * A qualifier that identifies entities based on a freetext search string.
 */
type FreetextQualifier = Readonly<{ freetext: string }>;

/**
 * Builds a qualifier for finding entities by the given freetext string.
 * @param freetext the text to search with
 * @returns the qualifier
 * @throws Error if the search string is not provided or is empty
 */
const byFreetext(freetext: string) => FreetextQualifier;

/**
* Returns `true` if the given qualifier is a {@link FreetextQualifier} otherwise `false`.
* @param qualifier the qualifier to check
* @returns `true` if the given type is a {@link FreetextQualifier}, otherwise `false`.
*/
const isFreetextQualifier = (qualifier: unknown) => qualifier is FreetextQualifier;

contact.ts

  // Move contents of libs/contact.ts into this file
namespace v1 {
  // Contact interface should go in the v1 namespace with the following documentation:
  /**
   * Immutable data about a contact.
   */

  /**
   * Immutable data about a contact, including the full records of the parent lineage.
   */
  interface ContactWithLineage extends Contact {
    readonly parent?: ContactWithLineage | NormalizedParent,
  }

  // New REST api: /api/v1/contact
  /**
   * Returns a function for retrieving a contact from the given data context.
   * @param context the current data context
   * @returns a function for retrieving a contact
   * @throws Error if a data context is not provided
   */
  /**
   * Returns a contact for the given qualifier.
   * @param qualifier identifier for the contact to retrieve
   * @returns the contact or `null` if no contact is found for the qualifier
   * @throws Error if the qualifier is invalid
   */
    const get = (context: DataContext) => (qualifier: UuidQualifier) => Promise<Nullable<Contact>>

  /**
   * Returns a function for retrieving a contact from the given data context with the contact's parent lineage.
   * @param context the current data context
   * @returns a function for retrieving a contact with the contact's parent lineage
   * @throws Error if a data context is not provided
   */
  /**
   * Returns a contact for the given qualifier with the contact's parent lineage.
   * @param qualifier identifier for the contact to retrieve
   * @returns the contact or `null` if no contact is found for the qualifier
   * @throws Error if the qualifier is invalid
   */
    const getWithLineage = (context: DataContext) => (qualifier: UuidQualifier) => Promise<Nullable<ContactWithLineage>>

    // New REST api: /api/v1/contact/id
  /**
   * Returns a function for retrieving a paged array of contact identifiers from the given data context.
   * @param context the current data context
   * @returns a function for retrieving a paged array of contact identifiers
   * @throws Error if a data context is not provided
   * @see {@link getIdsAll} which provides the same data, but without having to manually account for paging
   */
  /**
   * Returns an array of contact identifiers for the provided page specifications.
   * @param qualifier the limiter defining which identifiers to return
   * @param cursor the token identifying which page to retrieve. A `null` value indicates the first page should be
   * returned. Subsequent pages can be retrieved by providing the cursor returned with the previous page.
   * @param limit the maximum number of identifiers to return. Default is 10000.
   * @returns a page of contact identifiers for the provided specification
   * @throws Error if no qualifier is provided or if the qualifier is invalid
   * @throws Error if the provided `limit` value is `<=0`
   * @throws Error if the provided cursor is not a valid page token or `null`
   */
    const getIdsPage = (context: DataContext) => (qualifier: ContactTypeQualifier | FreetextQualifier, cursor: Nullable<string>, limit: number) => Promise<Page<string>>

  /**
   * Returns a function for getting a generator that fetches contact identifiers from the given data context.
   * @param context the current data context
   * @returns a function for getting a generator that fetches contact identifiers
   * @throws Error if a data context is not provided
   */
  /**
   * Returns a generator for fetching all contact identifiers that match the given qualifier
   * @param qualifier the limiter defining which identifiers to return
   * @returns a generator for fetching all contact identifiers that match the given qualifier
   * @throws Error if no qualifier is provided or if the qualifier is invalid
   */
  const getIdsAll = (context: DataContext) => (qualifier: ContactTypeQualifier | FreetextQualifier) => AsyncGenerator<string, null>
}

report.ts

namespace v1 {
  /**
   * A report document.
   */
  interface Report extends Doc {
    readonly form: string;
    readonly reported_date: Date;
    readonly fields: DataObject;
  }

  // New REST api: /api/v1/report
  /**
   * Returns a function for retrieving a report from the given data context.
   * @param context the current data context
   * @returns a function for retrieving a report
   * @throws Error if a data context is not provided
   */
  /**
   * Returns a report for the given qualifier.
   * @param qualifier identifier for the report to retrieve
   * @returns the report or `null` if no report is found for the qualifier
   * @throws Error if the qualifier is invalid
   */
    const get = (context: DataContext) => (qualifier: UuidQualifier) => Promise<Nullable<Report>>

    // New REST api: /api/v1/report/id
  /**
   * Returns a function for retrieving a paged array of report identifiers from the given data context.
   * @param context the current data context
   * @returns a function for retrieving a paged array of report identifiers
   * @throws Error if a data context is not provided
   * @see {@link getIdsAll} which provides the same data, but without having to manually account for paging
   */
  /**
   * Returns an array of report identifiers for the provided page specifications.
   * @param qualifier the limiter defining which identifiers to return
   * @param cursor the token identifying which page to retrieve. A `null` value indicates the first page should be
   * returned. Subsequent pages can be retrieved by providing the cursor returned with the previous page.
   * @param limit the maximum number of identifiers to return. Default is 10000.
   * @returns a page of report identifiers for the provided specification
   * @throws Error if no qualifier is provided or if the qualifier is invalid
   * @throws Error if the provided `limit` value is `<=0`
   * @throws Error if the provided cursor is not a valid page token or `null`
   */
    const getIdsPage = (context: DataContext) => (qualifier: FreetextQualifier, cursor: Nullable<string>, limit: number) => Promise<Page<string>>

  /**
   * Returns a function for getting a generator that fetches report identifiers from the given data context.
   * @param context the current data context
   * @returns a function for getting a generator that fetches report identifiers
   * @throws Error if a data context is not provided
   */
  /**
   * Returns a generator for fetching all report identifiers that match the given qualifier
   * @param qualifier the limiter defining which identifiers to return
   * @returns a generator for fetching all report identifiers that match the given qualifier
   * @throws Error if no qualifier is provided or if the qualifier is invalid
   */
  const getIdsAll = (context: DataContext) => (qualifier: FreetextQualifier) => AsyncGenerator<string, null>
}

person.ts/place.ts

Update the Person/PlaceWithLineage interfaces to use ContactWithLineage as the parent (instead of PlaceWithLineage). Pending https://github.com/medic/cht-core/issues/9584 it is probably not safe to assume that a contact parent will always be a place.

Need to do more research regarding the Place.contact value...

shared-libs/search

Update search logic to call through to the cht-datasource functions when doing freetext searches (or the basic contact_by_type search). For the multi-view searches, we will need to just stream in all pages.

Design details

As mentioned here (https://github.com/medic/cht-core/issues/9544#issuecomment-2430462690) the local adapter will need to account for Nouveau logic (though depending on the progress of that implementation, this PR might just port the same online/offline logic from shared-libs/search that was created by https://github.com/medic/cht-core/issues/9544).

In general we should strive to align to the patterns/code used for implementing the person/place-by-type flows in cht-datasource. All the paging logic should basically be the same.

Describe alternatives you've considered

Additional context

sugat009 commented 3 days ago

const getIdsPage = (context: DataContext) => (qualifier: ContactTypeQualifier | FreetextQualifier, cursor: Nullable, limit: number) => Promise<Page>

@jkuester For the Contact's getIdsPage API, for both cases when the qualifier is either ContactTypeQualifier or FreetextQualifier, is the goal to just search using Lucene's q search parameter?

jkuester commented 2 days ago

For now, I think you can just follow the existing logic in shared-libs/search which calls through to either the medic-client/contacts_by_freetext or the medic-client/contacts_by_type_freetext view queries. (Unless I missed something and @m5r has the Lucene integration ready to go...)

Unfortunately, the shared-libs/search logic flow is a bit disjointed and tricky to follow. Let me know if it would be helpful to go through it together. But, ultimately, the contact search logic should end up hitting one of these indexes

My goal here is to have these cht-datasource changes be decoupled from the other search efforts (at least for the time being). That way we can ship your work here without waiting on Lucene or offline views. If I am correct, it should not really be any extra work to wait and integrate the Lucene/offline stuff when it is actually ready to go....

sugat009 commented 2 days ago

I've pushed the skeleton code for this. Can you review if that's the pattern we're looking for?

jkuester commented 2 days ago

@sugat009 Could you be a bit more specific about the pattern you would like me to review? 😅 In general, the code in your PR looks to be headed in a great direction!

sugat009 commented 2 days ago

I wanted you to review the latest commit for implementation of /api/v1/contact/id API. The specific thing to be reviewed is the qualifier value, which can be ContactTypeQualifier | FreetextQualifier as per the description of this ticket. Those interfaces seem different from each other in terms of use cases and I was confused on whether to do a search based on both depending on what type of value is passed or do different stuff depending on type. Right now I've implemented the pagination-based search that we did on other entities like person and place which is just by passing the key field with the value from the qualifier as the value. This is somewhat based on the current shared-libs/search implementation where the key is set as the passed freetext search keyword. However, I did not implement the other part of implementation for now where the passed in freetext search keyword is being used as startkey and endkey.

jkuester commented 1 day ago

The specific thing to be reviewed is the qualifier value, which can be ContactTypeQualifier | FreetextQualifier as per the description of this ticket. Those interfaces seem different from each other in terms of use cases and I was confused on whether to do a search based on both depending on what type of value is passed or do different stuff depending on type.

Basically, if the provided qualifier is both a ContactTypeQualifier and a FreetextQualifier we should hit the medic-client/contacts_by_type_freetext index. If it is just a FreetextQualifier we should hit the medic-client/contacts_by_freetext index. If it is just a ContactTypeQualifier we should just hit the medic-client/contacts_by_type index.

medic / cht-core