How should I efficiently cluster multiple DBs?

Hi, I am in the process of building a searchable database of antibody and T cell receptor repertoires (here, a "repertoire" is a set of antibody or TCR sequences from a single blood sample from a single donor). Searches are performed using mmseqs, with each repertoire stored as a mmseqs DB. So far, the search function is working nicely. Next, I'd like to implement a clustering option. My idea was to allow a set of repertoire DBs to be selected and clustered using linclust. My questions are:

can either mergedbs or concatdbs be used to combine a set of DBs for clustering by linclust?
is there a more efficient strategy than combining the individual DBs?

Each DB is typically tens of thousands of sequences or more with typical length ~40 amino acids (i.e. just the three CDR regions concatenated; not full-length protein). Thanks in advance for your help!
-Daron

soedinglab / MMseqs2

How should I efficiently cluster multiple DBs? #519