transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

database option in DIAMOND_analysis_counter.py #41

Closed Shicheng-Guo closed 4 years ago

Shicheng-Guo commented 4 years ago

Dear Sir/Madam,

I want to know how to prepare this -D database for DIAMOND_analysis_counter.py script.

DIAMOND_analysis_counter.py

-D, database, specifies a reference database to search against for results

Thanks.

Shicheng

Shicheng-Guo commented 4 years ago

Sorry for the disturb. I find it in master_script.sh, it looks just a common fa file

RefSeq_db="$SAMSA/setup_and_test/tiny_databases/RefSeq_bac_TINY_24MB.fa"

Thanks all the same.

>WP_057199767.1 MULTISPECIES: bifunctional aconitate hydratase 2/2-methylisocitrate dehydratase [Acidovorax]
MLKAYRDHVAERAALGIPPLPLTAKQTGELVELLKNPPQGEEATLVDLITHRVPAGVDDAAKVKASYLAAVAHGTEVCTL
IGREKATQLLGTMLGGYNLTPLIDLLDDAAVASVAAEALKKTLLMFDQFHDVKEKADKGNTFAKSVLQSWADAEWFTSRP
EVPQSIKLTVLKVTGETNTDDLSPAPDAWSRPDIPLHALAMLKNKRDGITPEEDGKRGPIKFLEDLRAKGNLVAYVGDVV
GTGSSRKSATNSVLWFTGEDIPFVPNKRFGGVCLGSKIAPIFYNTMEDAGALPIELDVSQMNMGDEIELLPYAGKALKDG
QVVAEFTVKSDVLFDEVRAGGRIPLIIGRGLTAKAREALGLAPSTLFRLPQVPAATGKGFTLAQKMVGRACGLPEGLGVV
PGTYCEPRMTSVGSQDTTGPMTRDELKDLACLGFSSDLVMQSFCHTAAYPKPVDVKMHHELPDFISTRGGISLKPGDGII
HSWLNRMLLPDTVGTGGDSHTRFPIGISFPAGSGLVAFAAATGVMPLDMPESVLVRFKGKMQPGVTLRDLVHAIPLYAIK
EGHLTVAKQGKKNIFSGRILEIEGLPDMKVEQAFELSDASAERSAAGCTVRLNKEPVIEYINSNIVLLKNMIAQGYADAR
TIGRRIKAMEAWLANPQLLAPDENADYAAIIEIDLAEITEPIVCCPNDPDDAKTLSDVAGAAIDEVFIGSCMTNIGHFRA
ASKLLEGKRDIPVKLWMAPPTKMDAHQLTEEGHYGVFGAAGARMEMPGCSLCMGNQAQVKEGATVMSTSTRNFPNRLGKN
TNVYLGSAELSAICSKLGRIPSKEEYLADMGVLNQNGDKIYKYLNFDQIAEFKDVADGVAA
>WP_057282361.1 bifunctional aconitate hydratase 2/2-methylisocitrate dehydratase [Achromobacter sp. Root83]
MLDTYRQHVAERAALGIPPLPLTAKQTAELIELLKNPPAGEEQNLVELLTYRVPAGVDDAAKVKASYLAAVALGKEACAL
ISRAKATELLGTMLGGYNIGPLVELLDDAEIGTIAADALKKTLLMFDAFHDVKEKADRGNANAKSVMQSWADAEWFTSRP
ELPESLTITVFKVPGETNTDDLSPAPDATTRPDIPMHALAMLKNKRDGAAFEPEEDGKRGPVKFIESLKDKGHLVAYVGD
VVGTGSSRKSATNSVLWFTGEDIPFVPNKRFGGVCLGNKIAPIFYNTMEDAGALPIELDVSKMEMGDVVELRPYDGKAIK
NGEVIAEFEVKSDVLFDEVRAGGRIPLIIGRGLTAKAREALGLPASTLFRLPKDPVDSGKGYTLAQKMVGRACGLPEGRG
IRPGTYCEPKMTSVGSQDTTGPMTRDELKDLACLGFSADLVMQSFCHTAAYPKPVDVKTHHTLPDFISTRGGISLRPGDG
VIHSWLNRMLLPDTVGTGGDSHTRFPIGISFPAGSGLVAFAAATGVMPLDMPESVLVRFKGKLQPGVTLRDLVNAIPLYA
IKQGMLTVAKQGKKNIFSGRILEIEGLPDLKVEQAFELSDASAERSAAGCSVHLNKEPIIEYINSNIVMLKWMIANGYED
ERTLGRRIKAMEAWLADPQLLAPDADAEYAAVIEIDLADVHEPIVACPNDPDDVKTLSEVAGAKIDEVFIGSCMTNIGHF
RAASKLLEGKRDIPVKLWVAPPTKMDATQLTEEGHYGVFGTAGARTEMPGCSLCMGNQAQVREGATVMSTSTRNFPNRLG
KNTNVYLGSAELAAICSKLGRIPTKDEYMADMGVINKSGDQIYQYLNFDKIPDYKDVADVIEV
>WP_057299804.1 bifunctional aconitate hydratase 2/2-methylisocitrate dehydratase [Pelomonas sp. Root1217]
transcript commented 4 years ago

Hi Shicheng,

Yes, you've got it figured out. You can download the full databases using this script: https://github.com/transcript/samsa2/blob/master/setup_and_test/full_database_download.bash .

If you run this script, the databases will be downloaded to the default location to be used by the rest of the master_script.