mmcguffi / pLannotate

Webserver and command line tool for annotating engineered plasmids
GNU General Public License v3.0
97 stars 20 forks source link

custom database installation #38

Open c-ruprecht opened 8 months ago

c-ruprecht commented 8 months ago

Hey, I am trying to set up a custom blast database and run pLannotate using a custom yaml_file but run into some issues I have a fasta file mtcsb_parts.fasta containing my custom nucleotide sequences:

>1
NNNNNNN
>2
NNNNNNN

I create the blast database using: makeblastdb -in /Users/ruprec01/Documents/Faith_lab/Git/blastdb/mtcsb_parts/mtcsb_parts.fasta -title "mtcsb_parts" -dbtype nucl I have a mtcsb_parts.csv file containing descriptions of the sequneces in the same path:

sseqid,Feature,Type,Description
1,feature1,type1,descript1
2,feature2,type2,descript2

I create a custom_yaml file, that contains the entry

mtcsb_parts:
  details:
    compressed: false
    default_type: None
    location: /path-to-folder/mtcsb_parts
  location: /path-to-folder/mtcsb_parts
  method: blastn
  parameters:
  - -perc_identity 95
  priority: 1
  version: Downloaded 2021-07-23

I run plannotate using in conda using:

plannotate batch -i test.fasta \
--yaml_file plannotate_custom.yaml \
--output /output

I get the following error:

  streamlit run /Users/ruprec01/opt/anaconda3/envs/plannotate/bin/plannotate [ARGUMENTS]
Traceback (most recent call last):
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/bin/plannotate", line 10, in <module>
    sys.exit(main())
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/pLannotate.py", line 180, in main_batch
    gbk = rsc.get_gbk(recordDf, inSeq, kwargs["linear"])
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/resources.py", line 120, in get_gbk
    record = get_seq_record(inDf, inSeq, is_linear, record)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/resources.py", line 151, in get_seq_record
    inDf["feat loc"] = inDf.apply(FeatureLocation_smart, axis=1)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/pandas/core/frame.py", line 3940, in __setitem__
    self._set_item_frame_value(key, value)
  File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/pandas/core/frame.py", line 4094, in _set_item_frame_value
    raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column feat loc

I am wondering if you can help me out with how to create the blastdatabase properly and add the correct entry into the custom yaml file. plannotate works as soon as I add for example the snapgene entry back into the custom yaml file. Thanks for any help, really love pLannotate! Greetings, Constantin