ohlab / panDB

4 stars 2 forks source link

panDB on a non-PBS platform #2

Open jacodela opened 6 years ago

jacodela commented 6 years ago

I'm interested in running panDB to use with Kraken, however, I only have access to a file server or a compute cluster running SGE. Are you considering a more agnostic implementation of panDB?

Alternatively, can you provide access to a built database (such as the one you used in Zhou et al. 2018)?

Thanks.

twinsenzw commented 6 years ago

Thanks for your interest in panDB! We do have a copy of the database built for kraken; if you use dropbox I can upload the database and share with you the containing folder (I am out of town at the very moment but I will do that once I get back). However do note that since Kraken assigns reads to the LCA and panDB contains a big collection of species, many of the classification will be nonspecific.

Wei

jacodela commented 6 years ago

Thanks for your quick response!

I'm interested in classifying reads from stool mWGS samples, with a particular emphasis in a group of low abundance methanogens with available genomes in PATRIC but no cultured isolates. In your microbiome paper you state that the proportion of classified reads is higher when using panDB compared to repDB. Do you think it would be better to use repDB rather than panDB?

I would appreciate if you could send me both databases, the dropbox option sounds good.

Thanks!

twinsenzw commented 6 years ago

panDB will classify more reads than reprDB, but not with more taxonomic specificity. For example, when dealing with S. aureus reads, panDB may classify 100 reads and reprDB classifying 90. But panDB will only classify 50/100 to S.aureus, with the rest assigned to Staphylococcus; while reprDB will classify 80/90 to Staphylococcus aureus. This is because when we include more S. aureus genomes, we found more sequences that are similar to other Staphylococcus species, and Kraken is not able to tell which actual species it came from. I would generally recommend using reprDB for Kraken, and panDB for pathoscope pipelines.

I will upload both when I am back.

twinsenzw commented 6 years ago

Just checked, the Kraken indexes won't fit into my dropbox space... We do have the flat databases (fasta) hosted on ftp://ftp.jax.org/zhouw/referenceDB/ Have you considered building the kraken indexes from there? Let me know if you find any difficulties.

jacodela commented 6 years ago

Thanks! I'll give it a try