manticoresoftware / manticore-github-issue-search

Demo: GitHub search with Manticore Search
https://github.manticoresearch.com
4 stars 2 forks source link

Change internal architecture of storing data #25

Closed donhardman closed 1 month ago

donhardman commented 1 month ago

We should migrate to a system where we store each organization in a separate table. This approach will allow us to maintain the logic for searching one, two, or multiple repositories within a single organization. It will also help us narrow down query suggestions to be more relevant to each specific organization.

To start, we should update the code first. Once that's done, we can migrate the data by duplicating it and removing everything that doesn't belong to the organization.

donhardman commented 1 month ago

Implementing: https://github.com/manticoresoftware/manticore-github-issue-search/pull/26

Looking for the best way to migrate production database

donhardman commented 1 month ago

Fixes TODO:

donhardman commented 1 month ago

The problem with registryurl can't be fixed in our current setup. The reason? It's in a different repository, but we're searching in another one. We still suggest it because we now store data in the organization's repo.

sanikolaev commented 1 month ago

It's in a different repository, but we're searching in another one. We still suggest it because we now store data in the organization's repo

The context is that we now have a table per-organization to be able to do CALL KEYWORDS/SUGGEST easier. But what if we change the schema to per-repo tables, combine them into per-org distributed tables and then what will be left is to update select fuzzy=1 and call autocomplete, so that we run call suggest to each table separately since unlike call keywords, call suggest can't do it:

mysql> drop table if exists t; create table t(f text) min_infix_len='2'; insert into t values(1, 'abcdef'); drop table if exists t2; create table t2(f text) min_infix_len='2'; insert into t2 values(2, 'abcdefghi'); drop table if exists d; create table d type='distributed' local='t' local='t2'; call suggest('abdef', 'd');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(f text) min_infix_len='2'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1, 'abcdef')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
drop table if exists t2
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t2(f text) min_infix_len='2'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t2 values(2, 'abcdefghi')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
drop table if exists d
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table d type='distributed' local='t' local='t2'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
call suggest('abdef', 'd')
--------------

ERROR 1064 (42000): no such table d
donhardman commented 1 month ago

This task is done, merged into the main branch, and deployed.

As a result of splitting into table-per-repo, I created this task: https://github.com/manticoresoftware/manticore-github-issue-search/issues/27