motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
144 stars 24 forks source link

help to create a small database for testing #92

Closed jianhong closed 2 years ago

jianhong commented 2 years ago

Hi @AlessioMilanese,

Thank you for the great tools! I am trying to create a subset of the testing database for the testing file involved in your current db. However, there are so many files involved in the current release. Is is possible you create a small database just for testing or could you share the scripts to create one?

Jianhong.

AlessioMilanese commented 2 years ago

Hi Jianhong,

Happy to hear that you are interested in mOTUs!

I am trying to create a subset of the testing database for the testing file involved in your current db. However, there are so many files involved in the current release. Is is possible you create a small database just for testing or could you share the scripts to create one?

I'm not sure what you are trying to do. In general the database is an integral part of the tool. As updates in the tool and in the database are done at the same time, and the current code would not work for example on the database version 2.5.0.

If you tell me more what you would like to test, I can see if I can help you.

jianhong commented 2 years ago

@AlessioMilanese , Thank you for your quick reply. What I want is a small size database for testing and validation of the installation. If I understand right, the current testing files are included in the over 2G database via downloadDB. It will be great if you can help to generate a small size of testing data less than 50M to validate the basic function such as profile.

AlessioMilanese commented 2 years ago

I see, but to run mOTUs you need the database anyway. In order to be able to profile all 33k species you need the actual DNA sequence of the genes. Even if I would create a small database, you can test that it works, but you cannot run mOTUs. The big size of the database is not due to the test files, but it's because there are genes from more than 250,000 genomes.

jianhong commented 2 years ago

Is it possible just create a database for the test data with limitated species and limited genes, for example only for covid19s? Or you mean the mOTUs will check the database size?

AlessioMilanese commented 2 years ago

Have a look at: https://github.com/AlessioMilanese/read_counter You can provide sequences and run the profiler.

jianhong commented 2 years ago

Will it also work for profile? I think there is mis-communication. I really want to validate the installation via a small mOTUs for the test samples like db_mOTU/db_mOTU_test/test1_single.fastq in your test.py script.

AlessioMilanese commented 2 years ago

I really want to validate the installation via a small mOTUs for the test samples like db_mOTU/db_mOTU_test/test1_single.fastq in your test.py script.

That's not possible, the mOTUs tool is composed of code and database together.

In the link I sent you, you can do also profiling.

jianhong commented 2 years ago

Got it. Thank you for your time.