mreppell / Karp

Accurate and fast taxonomic classification using pseudoaligning
Other
21 stars 1 forks source link

16S species assignment #6

Open jsan4christ opened 5 years ago

jsan4christ commented 5 years ago

Hi @mreppell,

I trust that you are well,

I'm looking into the possibility of using karp to get better species assignments for 16S. Am not sure that am using it for the right purpose, but after looking at the examples its not clear to me how to proceed. Assuming I have an OTU table and a species assignment reference database like silva or rdp. What would be the way to go about this to a valid taxa file.

Please advise,

mreppell commented 5 years ago

Thank you for reaching out. Karp uses the base quality scores in sequencing reads to help resolve multiply mapping reads. As such, it requires the raw sequencing data, in fastq format, as input. An OTU table does not have the information that Karp needs to assign taxonomy.

I hope this helps, and I'm happy to answer any additional questions you have,

Mark Reppell

jsan4christ commented 5 years ago

Fair enough,

Could you please share the steps to accomplish this? assuming, I have a raw fastq and a database like silva or RDP.

What would be the series of steps to use karp to get the assignements for the reads?

On Tue, Aug 20, 2019 at 11:56 PM mreppell notifications@github.com wrote:

Thank you for reaching out. Karp uses the base quality scores in sequencing reads to help resolve multiply mapping reads. As such, it requires the raw sequencing data, in fastq format, as input. An OTU table does not have the information that Karp needs to assign taxonomy.

I hope this helps, and I'm happy to answer any additional questions you have,

Mark Reppell

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mreppell/Karp/issues/6?email_source=notifications&email_token=ABGBQRUDG4KUQK3FDQIOSPTQFRSCRA5CNFSM4IN425C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XZB2I#issuecomment-523210985, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGBQRRWEIS62ZM2PWQZ4PLQFRSCRANCNFSM4IN425CQ .

-- San Emmanuel James Skype: jsan4christ Mobile: UG +256752900304, SA +27725290848

The Lord is my shepherd, I shall not want! Psalms 23

mreppell commented 5 years ago

If have installed Karp and have:

Raw fastq file with reads: "yourdata.fastq.gz" Reference database fasta file: "reference.fasta" Reference database taxonomy file: "reference.tax" - Description of format of this file is on Karp main page

Then first you make an index of your reference database:

./karp -c index -r reference.fasta -i reference.index

This will create "reference .index". Then, you classify your fastq file using:

./karp -c quantify -r reference.fasta -i reference.index -f yourdata.fastq.gz -o yourdata.results -t reference.tax

This will produce a file "yourdata.results" containing Karp's estimates of taxa abundance in your sample. If you are using paired-end reads, then the code becomes:

./karp -c quantify -r reference.fasta -i reference.index -f yourdata.R1.fastq.gz -q yourdata.R2.fastq.gz --paired -o yourdata.results -t reference.tax

I hope this is helpful,

Mark