np-core / nanopath

Python package and command line interface - entry point for the repository :snake:
Other
5 stars 0 forks source link

GTDB compatibility #6

Open esteinig opened 4 years ago

esteinig commented 4 years ago

@dn-ra was thinking about GTDB compatibility of the classification results.

esteinig commented 4 years ago

@dn-ra would you be able to set up the GTDB database for Kraken2 - they do not include human right?

esteinig commented 4 years ago

It'd probably be good to get a set of databases that we can run some test data through and compare what do you think? I think Tania has looked little into this.

dn-ra commented 4 years ago

It's already done actually. You're on Nectar right?

/home/daniel/TEST_singularity/gtdb_rs89

And singularity image is here:

/home/daniel/TEST_singularity/kraken2_mini_gtdb.sif

esteinig commented 4 years ago

Very nice thanks! I'll check it out now

dn-ra commented 4 years ago

from singularity run-help:

This container runs kraken2 on your read data against the entire gtdb taxonomy. GTDB has been converted to a compressed kraken2 database using scripts provided by Ryan Wick (https://github.com/rrwick/Metagenomics-Index-Correction) To pass packaged database into pipe, pass '--db /gtdb_mini_r89' as parameter in kraken2.

Full command for kraken2 analysis of reads:

singularity exec -B [Directory where your data is stored]:/mnt kraken2_mini_gtdb.sif /kraken2/kraken2 --use-names --db /gtdb_mini_r89 /mnt/[Your read file name]

esteinig commented 4 years ago

Excellent. I might have to see how to setup databases for the current framework, since they are outside the containers for now.

dn-ra commented 4 years ago

This one is inside the container but made outside it and copied in during build.

esteinig commented 4 years ago

I found the data thanks!

Not quite sure yet how to best handle the databases, but it looks like it might be easier to put them on GCS and pull them once during setup on the server.

dn-ra commented 4 years ago

What's GCS?