vivo-community / scholars-discovery

BSD 3-Clause "New" or "Revised" License
2 stars 6 forks source link

Highlight and edismax/dismax query parameter support #213

Closed ghost closed 4 years ago

ghost commented 4 years ago

Requires hotfix release of spring-data-solr hosted in TAMUs Maven repository.

https://jira.spring.io/browse/DATASOLR-572 https://github.com/spring-projects/spring-data-solr/pull/115

Resolves #189 Resolves #193

GraphQL has breaking change. query string argument is now an object with multiple properties.

Supported properties:

e.g.

query {
  people(
    query: {
      q: "math test"
      df: "overview"
      fl: "type,name,overview"
    }
    facets: [
      { field: "type" }
    ]
    filters: [

    ]
    boosts: []
    highlight: {
      fields: ["overview"]
    }
    paging: {
      pageSize: 5
      pageNumber: 0
      sort: { orders: [{ direction: DESC, property: "score" }] }
    }    
  ) {
    content {
      id
      type
      name
      overview
    }
    page {
      totalElements
      totalPages
      number
      size
    }
    facets {
      field
      entries {
        content {
          value
          count
        }
      }
    }
    highlights {
      id
      snippets
    }
  }
}
ghost commented 4 years ago

It does look like (with one minor change) vivo-react still runs atop this with no problem.

I was wondering about the new "name_sort" type of fields. It does look like you might be using those in your UI, but at least if I send in "name_sort" in the sort: { orders: [ { property: "name_sort", direction:ASC }] in GraphQL it just says "field can't be found". That's fine cause we're not sending that in, but I didn't investigate what the advantage of that "*_sort" field might be.

In order to use name_sort or title_sort feature you will have to re-index. Name and title are often desired to search as well as sort. Search requires the field to be tokenized as sort is best it not be tokenized. Therefor, title and name fields are now tokenized while copying into *_sort dynamic field for sorting.

nymbyl commented 4 years ago

Thanks - I thought I did a reindex, but maybe I fooled myself. I'll look into it more. So the general idea is we should send in "name_sort" or "title_sort" etc... as the name of the field to sort on (in the GraphQL api)? And it should work better than simply "name" or "title"

ghost commented 4 years ago

If using a docker container for Solr, you will have to rebuild the image since the Solr schema changed. You will have to use name_sort or title_sort for sorting as the name and title field will be tokenized. Sort doen't work on tokenized fields. However, you can now use name or title in the df or qf parameters and get expected search results.

I noticed another bug with the fl fields. Will commit fix now.

nymbyl commented 4 years ago

I get it - I did not rebuild the image - makes sense now - thanks!

nymbyl commented 4 years ago

I did notice one thing - it does not effect our site and I'm not sure if it's expected or not - or even a problem etc...

We have someone in our data named "Abraham, Shayna L.". If I make a query object like this: query: { q: "Abraham"} I get results, but like this: query: { q: "Abraham", df: "name"} - not. On the other hand this does (added "" on q): `query: { q: "Abraham", df: "name"}`.

I couldn't tell quickly reading through df documentation if that was to be expected. Just thought I'd see if that makes sense

ghost commented 4 years ago

I did notice one thing - it does not effect our site and I'm not sure if it's expected or not - or even a problem etc...

We have someone in our data named "Abraham, Shayna L.". If I make a query object like this: query: { q: "Abraham"} I get results, but like this: query: { q: "Abraham", df: "name"} - not. On the other hand this does (added "" on q): `query: { q: "Abraham", df: "name"}`.

I couldn't tell quickly reading through df documentation if that was to be expected. Just thought I'd see if that makes sense

Have you re-indexed yet? The name is of format lastName, firstName and it is copied to _text_ and *_sort. The lastName and firstName fields are also copied to the _text_ field. The _text_ field is the default search field if df not specified. After re-index you should get results for query: { q: "Abraham", df: "name"} because name will be tokenized and will match one of the tokens. Without tokenizing name field it will only match on entire value lastName, firstName.

nymbyl commented 4 years ago

Well - I thought I rebuilt image and re-indexed - but I'm thinking we may not want to use 'df' anyway - or at least will look into that later, not now

ghost commented 4 years ago

Here is an example working for us.

query {
  people(
    query: {
      q: "Tom"
      df: "name"
      fl: "type,name,overview"
    }
    facets: [
      { field: "type" }
    ]
    filters: [

    ]
    boosts: []
    highlight: {
      fields: ["name"]
    }
    paging: {
      pageSize: 5
      pageNumber: 0
      sort: { orders: [{ direction: DESC, property: "score" }] }
    }    
  ) {
    content {
      id
      type
      name
      overview
    }
    page {
      totalElements
      totalPages
      number
      size
    }
    facets {
      field
      entries {
        content {
          value
          count
        }
      }
    }
    highlights {
      id
      snippets
    }
  }
}
{
  "data": {
    "people": {
      "content": [
        {
          "id": "n47bd0294",
          "type": [
            "FacultyMember"
          ],
          "name": "Vestal, Tom",
          "overview": "Research and teaching is guided by diffusion theory with foci on educational delivery strategies regarding agricultural biotechnology, food safety, food security, emergency management and livestock infectious disease response and recovery."
        }
      ],
      "page": {
        "totalElements": 1,
        "totalPages": 1,
        "number": 0,
        "size": 5
      },
      "facets": [
        {
          "field": "type",
          "entries": {
            "content": [
              {
                "value": "FacultyMember",
                "count": 1
              }
            ]
          }
        }
      ],
      "highlights": [
        {
          "id": "n47bd0294",
          "snippets": {
            "name": [
              "Vestal, <em>Tom</em>"
            ]
          }
        }
      ]
    }
  }
}