research-software-directory / RSD-as-a-service

This repo contains the new RSD-as-a-service implementation
https://research.software
22 stars 15 forks source link

Improve global search function #1111

Closed ewan-escience closed 5 months ago

ewan-escience commented 5 months ago

Improve global search function

Changes proposed in this pull request:

How to test:

Discussion: Should the function enforce a minimum length for its input?

Closes #1108

PR Checklist:

cmeessen commented 5 months ago

I can confirm that it's working as described :+1:

Additionally, I checked how it is performing with the generated data searching for "duh". In my case I get the following results:

image

It does put the project match before organisation, although the organisation starts with "duh" whereas the project has the term on the second place.

Discussion: Should the function enforce a minimum length for its input?

I think searching from the first letter may not make much sense. But two letters could already yield some meaningful results.

Only search on slug and title (not keywords, research domains or short descriptions anymore)

Searching in short descriptions was a request in the past, if I remember correctly. Would it be possible to include this again?

ewan-escience commented 5 months ago

It does put the project match before organisation, although the organisation starts with "duh" whereas the project has the term on the second place.

Note that Project: and Organisation: are part of their names, so it currently works as intended, since we don't look at the index of the query. We could do this, e.g. use the index as a tie breaker when sorting. Would you like that?

I think searching from the first letter may not make much sense. But two letters could already yield some meaningful results.

Agreed. However, SQL doesn't have an IF statement. So we could either do a complicated CASE statement, revert back to PL/pgSQL, only do that on the frontend, or leave it as it is.

Searching in short descriptions was a request in the past, if I remember correctly. Would it be possible to include this again?

Personally, I feel like this doesn't belong on a global search and that one can use the dedicated software search page for this. However, if more people think this is useful, I could add it back. I still have to think about how that should interact with the scores though.

dmijatovic commented 5 months ago

I am in favor of including subtitle/short description in the search.

jmaassen commented 5 months ago

Works well, but I also wonder if it wouldn't be better to also search in the short description, keywords, etc.

To me, the global search is a quick way to search though all three collections (software, projects, organisations). Often, I don't remember the exact name of software or project, but known it had something to do with a specific topic. A good example is "fusion" as a global search, which gives me things like duqtools, imas2xarray and the ITER Persistent Actors Framework project.

sonarcloud[bot] commented 5 months ago

Quality Gate Passed Quality Gate passed for 'rsd-database'

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

sonarcloud[bot] commented 5 months ago

Quality Gate Passed Quality Gate passed for 'rsd-frontend'

Issues
0 New issues

Measures
0 Security Hotspots
75.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

ewan-escience commented 5 months ago

I updated the global_search function to include keywords for software and projects; short descriptions for software; subtitles for projects.

When searching in the frontend, the results are returned in the following order, keeping in mind that all matches are case insensitive:

  1. software, projects and organisations with an exact match on title, name or slug
  2. software and projects where there is an exact match on one of their keywords
  3. software, projects and organisations where the query is a prefix of their title, name or slug
  4. software, projects and organisations where the query appears anywhere in their title, name or slug; these are then ordered by the index in which the match appears
  5. software, projects and organisations when the query is anywhere in the short description, subtitle or part of any keyword

Please test this with your own test cases and inspect the SQL code. Note that we can't satisfy all use cases and that we can refine it later.

Also test this with a larger data set. I expect it's fine for now (I tested this on production data), but as we grow, it might be too slow, especially when searching the keywords. We might need dedicated search engines then.

I also adapted the frontend so that it only searches when at least two character are given.