Closed awh082834 closed 1 year ago
can you give first few lines of your test.txt and acc_species files?
can you give first few lines of your test.txt and acc_species files?
Sure!
test.txt :
sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession($
plasmid_multifasta 6 - 45.80016295798813 d3b67463a20e833d37a15d555d7e0de0 rep_cluster_1522 000876__NC_009926_00214 MOBF,MOBF,MOBF,MOBF,MOBF NC_$
acc_species.txt
id organism
NZ_AP026076 Acaryochloris_marina_MBIC10699
NZ_AP026077 Acaryochloris_marina_MBIC10699
NZ_AP026078 Acaryochloris_marina_MBIC10699
NZ_AP026079 Acaryochloris_marina_MBIC10699
NC_009926 Acaryochloris_marina_MBIC11017
NC_009927 Acaryochloris_marina_MBIC11017
Thank you for the help!
Ah ok, so you ran MOB-typer without specifying -x or --multi. By default MOB-typer treats the entire fasta as one plasmid without the multi flag. So the output you have for mob-typer is the set of sequences merged into one entity. The sample_id's need to match between the mobtyper and species files.
Run MOB-typer on your sequences again but specify -x , and then use that file with your species identifications.
MOB-suite looks up the organism name in NCBI taxonomy db, if your name doesn't match then it will fail. I believe you have replaced all of the spaces in your organism name with "_" since the name in NCBI is "Acaryochloris marina MBIC10699".
Hope that helps!
It seems that I am getting the same error. Here are examples of each of the input files.
test.txt
sample_id num_contigs size gc md5 rep_type(s) rep_type_accession(s) relaxase_type(s) relaxase_type_accession(s) mpf_type mpf_type_accession(s) ori$
NZ_AP026076.1 1 393608 45.12408284384464 861d889a756bd5ac51968e6882f2f9ee - - MOBF NC_009927_00233 MPF_T NC_009927_00237 - - conjugative CP000839 $
NZ_AP026077.1 1 329949 46.69085222261622 840c6b922f7e8b9b9b028f79162fd924 - - MOBF NC_009931_00136 MPF_T NC_009931_00124 - - conjugative CP000843 $
NZ_AP026078.1 1 303490 46.09575274308873 48588e50149bb0f56c9520d34418ff94 - - MOBF NC_009929_00047 MPF_T NC_009930_00158 - - conjugative CP000838 $
NZ_AP026079.1 1 205174 43.208691159698596 3facb92109e65f2e622448046f573ee4 - - MOBF NC_009932_00077 MPF_T NC_009930_00058,NC_009932_00071 - - conjugative$
NC_009926.1 1 374161 47.34833400594931 f2978c55c5f74900466debf49a768e3a rep_cluster_1522 000876__NC_009926_00214 MOBF NC_009926_00327 MPF_T NC_009926_00331 - - $
NC_009927.1 1 356087 45.33667334106553 4148acb719c6a0b45229eb58a301259a - - MOBF NC_009927_00233 MPF_T NC_009927_00237 - - conjugative CP000839 $
acc_species.txt
id organism
NZ_AP026076.1 Acaryochloris marina MBIC10699
NZ_AP026077.1 Acaryochloris marina MBIC10699
NZ_AP026078.1 Acaryochloris marina MBIC10699
NZ_AP026079.1 Acaryochloris marina MBIC10699
NC_009926.1 Acaryochloris marina MBIC11017
NC_009927.1 Acaryochloris marina MBIC11017
Header example of plasmid_multifasta.fasta
>NZ_AP026076.1
ACCTTGTTCTTAAGCGTTTGATTAAAAACTGTAGGCCACCAAAAAATAAGACTTCAAATTCTCGCGAGAA
TCCAACACCATTAACATCTGGCTACCCCACATCTTGAAACAGGATTGATAGCCGAGTGATTAATGCTCCC
I made sure to double check that everything matches to one another however I still get a KeyError on 'organism' as in the original comment.
Thank you for all your help so far!
Thanks for the info, I am unable to replicate your error in the latest code pull. I recommend installing from github via pip pip install git+https://github.com/phac-nml/mob-suite. Could you try that out and see if it resolves your issue?
Hi, I am trying to test mob_cluster to build a large database of plasmids. While testing I have run into a KeyError on line 532 in mob_cluster.py.
mob_cluster --mode build -f plasmid_multifasta.fasta -p test.txt -t acc_species.txt --outdir test_build
Not sure where this error is coming from or if it is something with the files that I used for the inputs. I followed the scheme for the -t as well as generating the -p from the multifasta of plasmids as stated in a previous issue. Any help is appreciated!
Thanks!