phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

Is there a way to process multiple unrelated plasmids automatically with mob-typer in the program? #44

Closed Metaxo closed 4 years ago

kbessonov1984 commented 4 years ago

Could you provide more details? Do you mean multiple fasta inputs?

Metaxo commented 4 years ago

I have one fasta file with multiple unrelated plasmid sequences and want to do fasta-typer on them individually. Since typer is said to only allow fasta file that only has one plasmid, is there a way to do analysis on them automatically one-by-one?

kbessonov1984 commented 4 years ago

You can run mob_recon on this multi-sequence fasta file. You should get the same number of plasmids and get mob_typer aggregate report per each plasmid entry. Alternatively we can implement multi-plasmid typing in mob_typer assuming each entry in the file represent a single plasmid

Metaxo commented 4 years ago

Sorry, could you specify how I can implement multi-plasmid typing in mob-typer?

kbessonov1984 commented 4 years ago
  1. Multi-plamsid typing would require quite extensive modification of the mob_typer E.g. getRepliconContigs() and other results reporting functions. It is much more easier to split multi-fasta file into separate files. We will consider this feature in future releases.

  2. Alternatively run mob_recon -i multifasta.fa --run_typer -o outputfolder on multi-fasta file that should reconstruct all plasmid and type them automatically without any need of splitting them individually.

It seems a lot of error for little gain especially in cases when a given plasmid is split into several contigs (draft assembly) instead of a single contig (complete assembly).

jrober84 commented 4 years ago

@kbessonov1984 suggestion will largely do what you want but if you have multiple plasmids that are quite similar, this approach won't work. Assuming that your fasta file is a set of complete plasmids, the easiest way to run the analysis that you want is to split your multi-sequence fasta file into individual files with one sequence per file. This is easily done through any number of method but if you want a quick python script to do that you can use the python 3 code below. Once you have single fasta files you can then run mob-typer individually on each one easily

import pandas as pd
from argparse import (ArgumentParser, FileType)
import os

def parse_args():
    "Parse the input arguments, use '-h' for help"
    parser = ArgumentParser(
        description="Split multi-fasta to individual files")
    parser.add_argument('-o', '--outdir', type=str, required=True, help='Output directory')
    parser.add_argument('-i', '--in_fasta', type=str, required=True, help='Input assembly fasta file to process')

    return parser.parse_args()

def main():
    args = parse_args()

    outdir = args.outdir
    with open(args.in_fasta, "r") as handle:

        for record in SeqIO.parse(handle, "fasta"):
            id = record.id
            seq = record.seq
            outfile = os.path.join(outdir,"{}.fasta".format(id))
            out_fasta = open(outfile,'w')
            out_fasta.write(">{}\n{}\n".format(record.id,record.seq))
            out_fasta.close()

# call main function
if __name__ == '__main__':
    main()
Metaxo commented 4 years ago
  1. Multi-plamsid typing would require quite extensive modification of the mob_typer E.g. getRepliconContigs() and other results reporting functions. It is much more easier to split multi-fasta file into separate files. We will consider this feature in future releases.
  2. Alternatively run mob_recon -i multifasta.fa --run_typer -o outputfolder on multi-fasta file that should reconstruct all plasmid and type them automatically without any need of splitting them individually.

It seems a lot of error for little gain especially in cases when a given plasmid is split into several contigs (draft assembly) instead of a single contig (complete assembly).

Thanks for your answer. The reason I did not split fasta files into separate ones is that we have too many sequences and it's impossible to run them individually manually. For mob-recon --run_typer, the results generated do not contain the typing results for some reason. Could you check if the --run_typer function is working ?

Metaxo commented 4 years ago

@kbessonov1984 suggestion will largely do what you want but if you have multiple plasmids that are quite similar, this approach won't work. Assuming that your fasta file is a set of complete plasmids, the easiest way to run the analysis that you want is to split your multi-sequence fasta file into individual files with one sequence per file. This is easily done through any number of method but if you want a quick python script to do that you can use the python 3 code below. Once you have single fasta files you can then run mob-typer individually on each one easily

import pandas as pd
from argparse import (ArgumentParser, FileType)
import os

def parse_args():
    "Parse the input arguments, use '-h' for help"
    parser = ArgumentParser(
        description="Split multi-fasta to individual files")
    parser.add_argument('-o', '--outdir', type=str, required=True, help='Output directory')
    parser.add_argument('-i', '--in_fasta', type=str, required=True, help='Input assembly fasta file to process')

    return parser.parse_args()

def main():
    args = parse_args()

    outdir = args.outdir
    with open(args.in_fasta, "r") as handle:

        for record in SeqIO.parse(handle, "fasta"):
            id = record.id
            seq = record.seq
            outfile = os.path.join(outdir,"{}.fasta".format(id))
            out_fasta = open(outfile,'w')
            out_fasta.write(">{}\n{}\n".format(record.id,record.seq))
            out_fasta.close()

# call main function
if __name__ == '__main__':
    main()

Thanks for your answer! The reason we couldn't split the files and run them individually is that we have a lot of files and it'll take forever to run them individually. Is there a way to run mob-typer on multiple fasta files with single plasmid sequence automatically?