Closed nleroy917 closed 1 year ago
Turns out its this code that is slowing it down:
result = self.con.run_sql_fetchall(
count_sql,
*search_sql_values,
False,
admin_tuple,
*namespace,
)
This will be solved in the next iteration.
Can you clarify what exactly the issue was and how you're solving it?
issue is: we are trying to search projects using ilike fucntion in porject_dict(description). It is super slow, as we are searching through all project_dict objects. solution: disable description search for namespaces with project number>1K. (In longer term: decouple description from the project dict)
solution: disable description search for namespaces with project number>1K
This seems fine for now. GEO is the only namespace with >1K projects and hopefully you're using vector search for those anyways. What's needed is a way to quickly get the exact PEP for a given accession (assuming it exists in pephub)
After discussion, There was decision to disable search on description. Reasons: ilike
function is super slow + description now is not a separate column + eventually we want to use qdrant search
in v0.4.0 temporarily disabled the search by description due to long processing time and impact on the database.
This is directly related to https://github.com/pepkit/pephub/issues/154.
It can take up to a minute for search results to propagate back from the database when searching through the namespace projects for GEO. I've traced it to this code:
Does the database need indexed? I'm not sure why the searching would be so slow.