Closed Metaxo closed 4 years ago
I have one fasta file with multiple unrelated plasmid sequences and want to do fasta-typer on them individually. Since typer is said to only allow fasta file that only has one plasmid, is there a way to do analysis on them automatically one-by-one?
You can run mob_recon on this multi-sequence fasta file. You should get the same number of plasmids and get mob_typer aggregate report per each plasmid entry. Alternatively we can implement multi-plasmid typing in mob_typer assuming each entry in the file represent a single plasmid
Sorry, could you specify how I can implement multi-plasmid typing in mob-typer?
Multi-plamsid typing would require quite extensive modification of the mob_typer
E.g. getRepliconContigs()
and other results reporting functions. It is much more easier to split multi-fasta file into separate files. We will consider this feature in future releases.
Alternatively run mob_recon -i multifasta.fa --run_typer -o outputfolder
on multi-fasta file that should reconstruct all plasmid and type them automatically without any need of splitting them individually.
It seems a lot of error for little gain especially in cases when a given plasmid is split into several contigs (draft assembly) instead of a single contig (complete assembly).
@kbessonov1984 suggestion will largely do what you want but if you have multiple plasmids that are quite similar, this approach won't work. Assuming that your fasta file is a set of complete plasmids, the easiest way to run the analysis that you want is to split your multi-sequence fasta file into individual files with one sequence per file. This is easily done through any number of method but if you want a quick python script to do that you can use the python 3 code below. Once you have single fasta files you can then run mob-typer individually on each one easily
import pandas as pd
from argparse import (ArgumentParser, FileType)
import os
def parse_args():
"Parse the input arguments, use '-h' for help"
parser = ArgumentParser(
description="Split multi-fasta to individual files")
parser.add_argument('-o', '--outdir', type=str, required=True, help='Output directory')
parser.add_argument('-i', '--in_fasta', type=str, required=True, help='Input assembly fasta file to process')
return parser.parse_args()
def main():
args = parse_args()
outdir = args.outdir
with open(args.in_fasta, "r") as handle:
for record in SeqIO.parse(handle, "fasta"):
id = record.id
seq = record.seq
outfile = os.path.join(outdir,"{}.fasta".format(id))
out_fasta = open(outfile,'w')
out_fasta.write(">{}\n{}\n".format(record.id,record.seq))
out_fasta.close()
# call main function
if __name__ == '__main__':
main()
- Multi-plamsid typing would require quite extensive modification of the
mob_typer
E.g.getRepliconContigs()
and other results reporting functions. It is much more easier to split multi-fasta file into separate files. We will consider this feature in future releases.- Alternatively run
mob_recon -i multifasta.fa --run_typer -o outputfolder
on multi-fasta file that should reconstruct all plasmid and type them automatically without any need of splitting them individually.It seems a lot of error for little gain especially in cases when a given plasmid is split into several contigs (draft assembly) instead of a single contig (complete assembly).
Thanks for your answer. The reason I did not split fasta files into separate ones is that we have too many sequences and it's impossible to run them individually manually. For mob-recon --run_typer, the results generated do not contain the typing results for some reason. Could you check if the --run_typer function is working ?
@kbessonov1984 suggestion will largely do what you want but if you have multiple plasmids that are quite similar, this approach won't work. Assuming that your fasta file is a set of complete plasmids, the easiest way to run the analysis that you want is to split your multi-sequence fasta file into individual files with one sequence per file. This is easily done through any number of method but if you want a quick python script to do that you can use the python 3 code below. Once you have single fasta files you can then run mob-typer individually on each one easily
import pandas as pd from argparse import (ArgumentParser, FileType) import os def parse_args(): "Parse the input arguments, use '-h' for help" parser = ArgumentParser( description="Split multi-fasta to individual files") parser.add_argument('-o', '--outdir', type=str, required=True, help='Output directory') parser.add_argument('-i', '--in_fasta', type=str, required=True, help='Input assembly fasta file to process') return parser.parse_args() def main(): args = parse_args() outdir = args.outdir with open(args.in_fasta, "r") as handle: for record in SeqIO.parse(handle, "fasta"): id = record.id seq = record.seq outfile = os.path.join(outdir,"{}.fasta".format(id)) out_fasta = open(outfile,'w') out_fasta.write(">{}\n{}\n".format(record.id,record.seq)) out_fasta.close() # call main function if __name__ == '__main__': main()
Thanks for your answer! The reason we couldn't split the files and run them individually is that we have a lot of files and it'll take forever to run them individually. Is there a way to run mob-typer on multiple fasta files with single plasmid sequence automatically?
Could you provide more details? Do you mean multiple fasta inputs?