pepkit / pepdbagent

Database for storing sample metadata
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Searching is incredibly slow #78

Closed nleroy917 closed 1 year ago

nleroy917 commented 1 year ago

This is directly related to https://github.com/pepkit/pephub/issues/154.

It can take up to a minute for search results to propagate back from the database when searching through the namespace projects for GEO. I've traced it to this code:

search_result = agent.annotation.get(
            query=q,
            namespace=namespace,
            limit=limit,
            offset=offset,
            admin=namespace_access,
        )

Does the database need indexed? I'm not sure why the searching would be so slow.

nleroy917 commented 1 year ago

Turns out its this code that is slowing it down:

result = self.con.run_sql_fetchall(
            count_sql,
            *search_sql_values,
            False,
            admin_tuple,
            *namespace,
        )
khoroshevskyi commented 1 year ago

This will be solved in the next iteration.

nsheff commented 1 year ago

Can you clarify what exactly the issue was and how you're solving it?

khoroshevskyi commented 1 year ago

issue is: we are trying to search projects using ilike fucntion in porject_dict(description). It is super slow, as we are searching through all project_dict objects. solution: disable description search for namespaces with project number>1K. (In longer term: decouple description from the project dict)

nleroy917 commented 1 year ago

solution: disable description search for namespaces with project number>1K

This seems fine for now. GEO is the only namespace with >1K projects and hopefully you're using vector search for those anyways. What's needed is a way to quickly get the exact PEP for a given accession (assuming it exists in pephub)

khoroshevskyi commented 1 year ago

After discussion, There was decision to disable search on description. Reasons: ilike function is super slow + description now is not a separate column + eventually we want to use qdrant search

khoroshevskyi commented 1 year ago

in v0.4.0 temporarily disabled the search by description due to long processing time and impact on the database.