treangenlab / emu

MIT License
32 stars 4 forks source link

How to find which refseq the sequence got matched ? #22

Open arpit20328 opened 2 weeks ago

arpit20328 commented 2 weeks ago

So I have found taxid 573 (Kleb. Pneumonia) in my sample via emu

In fasta sequence of emu database it has following refseq of taxid 573

68233:emu_db:573 ["68233:ncbi:572 ['NR_114837.2 Streptomyces luteogriseus strain ISP 5483 16S ribosomal RNA, partial sequence']"] 67305:emu_db:574 ["67305:ncbi:573 ['NR_114824.2 Streptomyces hawaiiensis strain ISP 5042 16S ribosomal RNA, partial sequence']"] 573:emu_db:5192 ["573:ncbi:5191 ['NR_119278.1 Klebsiella pneumoniae strain ATCC 13883 16S ribosomal RNA, partial sequence']"] 573:emu_db:5194 ["573:ncbi:5193 ['NR_119276.1 Klebsiella pneumoniae subsp. ozaenae strain ATCC 11296 16S ribosomal RNA, partial sequence']"] 573:emu_db:6548 ["573:ncbi:6547 ['NR_117686.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:6549 ["573:ncbi:6548 ['NR_117685.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:6550 ["573:ncbi:6549 ['NR_117684.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:6551 ["573:ncbi:6550 ['NR_117683.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:6552 ["573:ncbi:6551 ['NR_117682.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:9253 ["573:ncbi:9252 ['NR_114715.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:9439 ["573:ncbi:9438 ['NR_114507.1 Klebsiella pneumoniae subsp. rhinoscleromatis ATCC 13884 16S ribosomal RNA, partial sequence']"] 573:emu_db:9440 ["573:ncbi:9439 ['NR_114506.1 Klebsiella pneumoniae strain ATCC 13883 16S ribosomal RNA, partial sequence']"] 573:emu_db:10213 ["573:ncbi:10212 ['NR_113702.1 Klebsiella pneumoniae strain NBRC 14940 16S ribosomal RNA, partial sequence']"] 573:emu_db:10526 ["573:ncbi:10525 ['NR_113240.1 Klebsiella pneumoniae strain JCM 1662 16S ribosomal RNA, partial sequence']"] 573:emu_db:11618 ["573:ncbi:11617 ['NR_112009.1 Klebsiella pneumoniae strain JCM1662 16S ribosomal RNA, partial sequence']"] 573:emu_db:15433 ["573:ncbi:15429 ['NR_041750.1 Klebsiella pneumoniae subsp. ozaenae strain ATCC 11296 16S ribosomal RNA, partial sequence']"] 573:emu_db:17284 ["573:ncbi:17280 ['NR_036794.1 Klebsiella pneumoniae strain DSM 30104 16S ribosomal RNA, partial sequence']"] 573:emu_db:20442 ["573:ncbi:20438 ['NR_037084.1 Klebsiella pneumoniae subsp. rhinoscleromatis strain R-70 16S ribosomal RNA gene, partial sequence']"] 193:emu_db:21962 ["193:rrn:573 ['Azospirillum lipoferum 4B|GCF_000283655.1|NC_016585.1|Plasmid: AZO_p1|1757..3278 -']"]

How do I know to which refseq my sequence got aligned ? is there a way to find ? @MGNute @treangen @bkille @jodjo86 @beleafs

jodjo86 commented 2 weeks ago

I would use the "--keep-files" argument to keep the alignment file (.sam) generated by minimap2. By default, EMU keeps 50 alignments per read (--N argument). In the SAM alignment file, the 1st column indicates the read ID and the 3rd the reference sequence name (from the BD). The 5th column is the alignment quality.

I'm just an EMU user but I hope this helps.

ref: https://samtools.github.io/hts-specs/SAMv1.pdf