phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
110 stars 31 forks source link

Cannot find mob-typer accessions on NCBI #165

Open Rikkiff opened 1 month ago

Rikkiff commented 1 month ago

I am trying to find the accessions to Rep, MOB and MPF genes identified by mob-typer in NCBI. An example would be an identified Rep protein: 002374__CP004064_00076. When I search for this accession in NCBI, I do not get any results.

kbessonov1984 commented 1 month ago

Hi, The replication protein id quoted (002374__CP004064_00076) is found in annotated genome at https://www.ncbi.nlm.nih.gov/nuccore/CP004064.1?report=genbank at 61201..62241 range with protein_id accession AGE31365.1 (putative replication protein repa)

Rikkiff commented 1 month ago

Thanks Kirill! The nucleotide has three replication proteins, so how did you deduce which one is 002374__CP004064_00076?

kbessonov1984 commented 1 month ago

Hi, you can do a pairwise alignment with BLASTn using even a webgui or command line. I run BLASTn between the CP004064 annotated genome and the 002374__CP004064_00076 sequence

>002374__CP004064_00076|rep_cluster_893
ATGACTGATTTTAAATTTTTTAAGGCGGATAGAGTTTACAACGAATTATTTTATCAATTTCCAAAAGTCT
TTATTGTTTCTGACGAATACAAAAAAATGAAAGATTCAACTAAGATTGCCTATATGCTTTTAAAAGCAAG
ATTAGAGATCGCAATCAGCAAACGGCAAATCGATGAAGAAGGTAATGTTTATTTTACTTATACGACAAAT
GAACTATGTAGAGTATTAAACTGCCAAAAACAAAAAGCGATAGCAATCAAAAAAGAGTTGGAATCCTTTG
GTTTATTATTACAAAAGCAGATGGGATTTAACAAACAGTTAGGGAAAAATAATCCTAATAGACTATATCT
AGCAGAATTAAAAGTCTCAGAAAATGATATCTACTTACTCGAAAAATTTGATAGAGAGAATAGGGAAAAC
GTTGATAAATCAGAGGGTATGAAAATCATACCCACCCTCGACGAAAAATCAGACGCTGAATCCCTTGGGG
CTCAAGAGGGTATGAAAATCATACCGTGCCAAAACGTTGATAAATCAGAGGGTATGAAAATCATACCAGA
ACTTAATAATAATATATTAGACACTAATAGACACAATATAGACACTGAAAAAGACCGCCTACAAGATCAA
TTGTTGTTAGACAATTTTGAGACAATTATGACAAACGACAGCATTGCTACGTTTGTCCCTGAACGAGTAT
TAAATTTGATAAAAACATTTTCTTCAAGTTACAGTGAAGCTCAAAAAACCGTCCAGACTATTCATAATGC
AAAGAAAAAAGCTGAAATAGAAAGTGGTATTTCGATAGTTTTTGAAGAACTCGATAGTTATTATGTCAAT
GCAGAACAAGAATTATACACGACACTGTTAAAAGCCTATCAAAAATTAAAAACCGAAAAAGTCGAAAATA
TCCAGAACCTGATTTTTGTCTATGTAAAAAATTGGTTTATCGAAAAACCAATAGCTGCTAAAGTATCAAG
TGAAAAACGTTTGAATTATGAAAGCTCCCCAAGCACTATTACGAAAGACTGGTTAGAGTGA

The BLASTn will let you know the alignment range which is in this case is 61201 to 62241. Then look at the https://www.ncbi.nlm.nih.gov/nuccore/CP004064.1?report=genbank annotated genome and find entry that best matches that identified range which is in this case is a protein with accession AGE31365.1.

61201..62241
/locus_tag="M7W_65"
61201..62241
/locus_tag="M7W_65"
/codon_start=1
/transl_table=11
/product="putative replication protein repa"
/protein_id="[AGE31365.1](https://www.ncbi.nlm.nih.gov/protein/445194258)"
/translation="MTDFKFFKADRVYNELFYQFPKVFIVSDEYKKMKDSTKIAYMLL
KARLEIAISKRQIDEEGNVYFTYTTNELCRVLNCQKQKAIAIKKELESFGLLLQKQMG
FNKQLGKNNPNRLYLAELKVSENDIYLLEKFDRENRENVDKSEGMKIIPTLDEKSDAE
SLGAQEGMKIIPCQNVDKSEGMKIIPELNNNILDTNRHNIDTEKDRLQDQLLLDNFET
IMTNDSIATFVPERVLNLIKTFSSSYSEAQKTVQTIHNAKKKAEIESGISIVFEELDS
YYVNAEQELYTTLLKAYQKLKTEKVENIQNLIFVYVKNWFIEKPIAAKVSSEKRLNYE
SSPSTITKDWLE"
Rikkiff commented 3 weeks ago

Thanks again Kirill!

It would be a lot easier if the accession AGE31365.1 was given directly instead of 002374__CP004064_00076. Also, where did you find the sequence of 002374__CP004064_00076?