sharpton / shotmap

A Shotgun Metagenome Annotation Pipeline
GNU General Public License v2.0
23 stars 15 forks source link

Database Setup #8

Open adityabandla opened 7 years ago

adityabandla commented 7 years ago

Hi

Sorry for the rather naive question. But could you point me to any tutorials on how to setup, for example a KEGG protein family database, to use with shotmap?

After having done quite an amount of reading, I am still a bit lost at this step

Best, Aditya

sharpton commented 7 years ago

Hi Aditya,

Sorry for the delayed reply - it's been a busy few days of deadlines.

This is a good question. There are some instructions on this here:

https://github.com/sharpton/shotmap/blob/master/docs/build_shotmap_searchdb.pl.md

See in particular this note:

-r, --refdb=/PATH/TO/REFERENCE/FLATFILES (REQUIRED argument) NO DEFAULT

VALUE

Location of the protein family reference data. Each family must have a HMM (if running HMMER tools) or a set of protein sequences sequences, in fasta format, that are members of the family (if running blast-like tools).

Files in this directory should correspond to an individual family, with the prefix of the file being the family identifier (e.g., IPR020405) and the suffix should either be .hmm (for HMMs) or .fa (for protein sequences). These files can be placed in any subdirectory structure within this upper level directory; shotmap will recurse through all subdirectories and append all appropriate .hmm or .fa files to the list of families that will be incorporated into the search database.

In short, you want to create directory of fasta files, where each file is a distinct KO and contains the sequences from KEGG that are members of that KO.

Does this answer your question? If not, I'm happy to help with additional questions.

On Sun, Feb 19, 2017 at 2:54 PM, Aditya Bandla notifications@github.com wrote:

Hi

Sorry for the rather naive question. But could you point me to any tutorials on how to setup, for example a KEGG protein family database, to use with shotmap?

After having done quite an amount of reading, I am still a bit lost at this step

Best, Aditya

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sharpton/shotmap/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEUcALAExXiC0IOWAEi1hv7JFoMDdIXks5reMg8gaJpZM4MFoHL .

-- Thomas J. Sharpton

Assistant Professor Department of Microbiology Department of Statistics Oregon State University

(541) 737-8623 thomas.sharpton@gmail.com @tjsharpton lab.sharpton.org