Hi, I am in the process of building a searchable database of antibody and T cell receptor repertoires (here, a "repertoire" is a set of antibody or TCR sequences from a single blood sample from a single donor). Searches are performed using mmseqs, with each repertoire stored as a mmseqs DB. So far, the search function is working nicely. Next, I'd like to implement a clustering option. My idea was to allow a set of repertoire DBs to be selected and clustered using linclust. My questions are:
can either mergedbs or concatdbs be used to combine a set of DBs for clustering by linclust?
is there a more efficient strategy than combining the individual DBs?
Each DB is typically tens of thousands of sequences or more with typical length ~40 amino acids (i.e. just the three CDR regions concatenated; not full-length protein). Thanks in advance for your help!
-Daron
Hi, I am in the process of building a searchable database of antibody and T cell receptor repertoires (here, a "repertoire" is a set of antibody or TCR sequences from a single blood sample from a single donor). Searches are performed using mmseqs, with each repertoire stored as a mmseqs DB. So far, the search function is working nicely. Next, I'd like to implement a clustering option. My idea was to allow a set of repertoire DBs to be selected and clustered using linclust. My questions are:
Each DB is typically tens of thousands of sequences or more with typical length ~40 amino acids (i.e. just the three CDR regions concatenated; not full-length protein). Thanks in advance for your help!
-Daron