mmcguffi / pLannotate

Webserver and command line tool for annotating engineered plasmids
GNU General Public License v3.0
97 stars 20 forks source link

Entry size is too large -- must be 50000 bases or less. #14

Closed HJTsai closed 2 years ago

HJTsai commented 2 years ago

Hi,

Whether the limitation of plasmid size could be removed in web server? Thank you

Amber

mmcguffi commented 2 years ago

Hi Amber,

We unfortunately have to leave it at 50,000 bp for the public web server to make sure we don't exceed our computational resources.

Is there a reason you need to annotate something larger than 50,000 bp? Most engineered DNA is not this large so there may be a better tool available.

In any case, if you feel comfortable sending me your sequence by email, I would be happy to run it myself on an unrestricted version and send it back to you.

jeffreybarrick commented 2 years ago

For larger sequences, we recommend that you install a local version and run it at the command line. This is most easily done using bioconda.

https://anaconda.org/bioconda/plannotate

@mmcguffi Is there a way for a user to change a configuration file to increase the maximum sequence size of the local web server version?

HJTsai commented 2 years ago

@mmcguffi and @jeffreybarrick Thank you for your reply and suggestion. Some of the de novo assembly plasmids are larger than 50,000 bp, I just separate the sequence into two files for annotation. Or whether the limitation of plasmid size could be increased to 100 kbp in web server? Thank you.

Amber

mmcguffi commented 2 years ago

@HJTsai I removed the plasmid size restriction for the command line tool, though this update is not yet pushed to bioconda -- you will have to git clone the repository to get these changes for now.

Unfortunately due to computational limits we cannot increase the size for the public web server. If you have any issues with installation or usage of the command line tool, please feel free to reach out.

HJTsai commented 2 years ago

@HJTsai I removed the plasmid size restriction for the command line tool, though this update is not yet pushed to bioconda -- you will have to git clone the repository to get these changes for now.

Thank you. It's work.

HJTsai commented 2 years ago

@mmcguffi I found that the plasmid size restriction was removed, but the annotation above 50 kb was not reported in csv and html files. Could you please help me confirm the above information? Thanks.

mmcguffi commented 2 years ago

@HJTsai I don't see this issue on my end -- for instance when using the gbk downloaded from here and running this command: plannotate batch -i ./addgene-plasmid-70261-sequence-348706.gbk -o ~/Desktop/ -h -c, I get a gbk, html, and csv file:

Screen Shot 2022-04-19 at 10 30 43 AM

Were you able to download plannotate from a git clone and install manually?

HJTsai commented 2 years ago

I download plannotate from a git clone and install manually. The following pictures are annotated by web server. The de novo assembly plasmid is larger than 50,000 bp, I just separate the sequence into two files for annotation. image image

The following picture is annotated by command. plannotate batch -i bc03_m2_c35_homopolish/consensus_homopolished.fasta --html -c -f bc03_m2_c35 The annotation above 50 kb (repR, yefC, NMFIC_NEIMB and tnp1) was not reported in csv and html files. image

mmcguffi commented 2 years ago

@HJTsai can you send me your file/sequence so I can take a closer look and try to debug?

I notice that the 2nd plasmid fragment you show (~21,000 bps) is actually a subset of the first plasmid fragment (~42,500 bp) -- between ~8,500 bp and ~30,000 bp.

Also, I notice that these appear to be all hits from SwissProt -- this indicates to me this is either a natural plasmid, or at least a very recently "domesticated" natural plasmid. You might want to take a look at Prokka: https://github.com/tseemann/prokka

I have a feeling that Prokka will do a better job annotating your plasmid, though if you send me your plasmid sequence I would be happy to take a closer look at what is going wrong here

HJTsai commented 2 years ago

The files are attached. Thanks.

Amber


寄件者: mmcguffi @.> 寄件日期: 2022年4月20日 下午 10:56 收件者: barricklab/pLannotate @.> 副本: HJTsai @.>; Mention @.> 主旨: Re: [barricklab/pLannotate] Entry size is too large -- must be 50000 bases or less. (Issue #14)

@HJTsaihttps://github.com/HJTsai can you send me your file/sequence so I can take a closer look and try to debug?

I notice that the 2nd plasmid you show (~21,000 bps) is actually a subset of the first plasmid (~42,500 bp) -- between ~8,500 bp and ~30,000 bp.

Also, I notice that these appear to be all hits from SwissProt -- this indicates to me this is either a natural plasmid, or at least a very recently "domesticated" natural plasmid. You might want to take a look at Prokka: https://github.com/tseemann/prokka

I have a feeling that Prokka will do a better job annotating your plasmid, though if you send me your plasmid sequence I would be happy to take a closer look at what is going wrong here

— Reply to this email directly, view it on GitHubhttps://github.com/barricklab/pLannotate/issues/14#issuecomment-1104031194, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL5HUJCFBJU7V5KUX2HA2GDVGALJVANCNFSM5S6JSVPA. You are receiving this because you were mentioned.Message ID: @.***>

mmcguffi commented 2 years ago

@HJTsai I think something went wrong -- I dont see any files attached here or via email

HJTsai commented 2 years ago

I try again. The files are attached. Thanks.

Amber


寄件者: mmcguffi @.> 寄件日期: 2022年4月21日 上午 08:49 收件者: barricklab/pLannotate @.> 副本: HJTsai @.>; Mention @.> 主旨: Re: [barricklab/pLannotate] Entry size is too large -- must be 50000 bases or less. (Issue #14)

@HJTsaihttps://github.com/HJTsai I think something went wrong -- I dont see any files attached here or via email

— Reply to this email directly, view it on GitHubhttps://github.com/barricklab/pLannotate/issues/14#issuecomment-1104590196, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL5HUJBQPFBEQMFSEKPGRDLVGCQ2NANCNFSM5S6JSVPA. You are receiving this because you were mentioned.Message ID: @.***>

mmcguffi commented 2 years ago

@HJTsai unfortunately there are still no files attached. You can drag and drop a file into the this Github thread to attach a file. Attaching/sending by replaying to this thread by email doesn't seem to work

HJTsai commented 2 years ago

@mmcguffi txt files are not attached. I transfer to word files. bc03_c35.docx bc03_c35_1.docx bc03_c35_2.docx Thanks.

mmcguffi commented 2 years ago

@HJTsai, sorry for the delay in this, but I think I finally have a solution to your problem. If you save the follow code below as fix.yaml or something similar, you point plannotate to this file when running in batch mode.

e.g.: plannotate batch -i my_plasmid.fa -y ~/my_path/fix.yaml

Rfam:
  details:
    compressed: false
    default_type: ncRNA
    location: None
  location: Default
  method: infernal
  priority: 3
  version: release 14.5
fpbase:
  details:
    compressed: false
    default_type: CDS
    location: Default
  location: Default
  method: diamond
  parameters:
  - -k 0
  - --min-orf 1
  - --matrix BLOSUM90
  - --gapopen 10
  - --gapextend 1
  - --algo ctg
  - --id 75
  - --max-hsps 10
  - --culling-overlap 200
  - --seed-cut .001
  - --comp-based-stats 0
  priority: 1
  version: downloaded 2020-09-02
snapgene:
  details:
    compressed: false
    default_type: None
    location: Default
  location: Default
  method: blastn
  parameters:
  - -perc_identity 95
  - -max_target_seqs 20000
  - -culling_limit 25
  - -word_size 12
  priority: 1
  version: Downloaded 2021-07-23
swissprot:
  details:
    compressed: true
    default_type: CDS
    location: Default
  location: Default
  method: diamond
  parameters:
  - -k 0
  - --min-orf 1
  - --matrix BLOSUM90
  - --gapopen 10
  - --gapextend 1
  - --algo ctg
  - --id 50
  - --max-hsps 10
  - --culling-overlap 200
  - --seed-cut .001
  - --comp-based-stats 0
  priority: 2
  version: Release 2021_03

If you are curious, the specific fix here is the addition of --max-hsps 10 --culling-overlap 200 to the diamond query of the protein databases

In my hands at least, your plasmid annotated much better with prokka, so that may be something to check out if you haven't yet