Closed kad-ecoli closed 6 years ago
Seems the file size is updated.
Thank for your report. I copied a wrong file to the Metaclust 2018_01 release. It should be fixed now. Information on the current release can be found in the latest version of the preprint: https://www.biorxiv.org/content/early/2018/01/05/104034.full.pdf+html.
The input set size of the Metaclust did not increase since the first release. The data should be seen as proof of concept for Linclust. We can not commit to such a data intensive procedure at this point. It took weeks to download the full datasets used in this study.
We believe that a sequence database based on metagenomic sequences should be offered rather by institutions that have direct access to huge amounts of metagenomic data (e.g. EMBL, NCBI, JGI, Argonne National Lab, ...).
Metaclust, a database clustered by linclust protocol in MMseqs2, is becoming smaller with each release. Metaclust95 2017_01 has 97G. Metaclust 2017_05 has 60G. Metaclust 2018_01 has 28G only. Shouldn't the number of Metaclust entries increase with time?