Closed HJTsai closed 2 years ago
Hi Amber,
We unfortunately have to leave it at 50,000 bp for the public web server to make sure we don't exceed our computational resources.
Is there a reason you need to annotate something larger than 50,000 bp? Most engineered DNA is not this large so there may be a better tool available.
In any case, if you feel comfortable sending me your sequence by email, I would be happy to run it myself on an unrestricted version and send it back to you.
For larger sequences, we recommend that you install a local version and run it at the command line. This is most easily done using bioconda.
https://anaconda.org/bioconda/plannotate
@mmcguffi Is there a way for a user to change a configuration file to increase the maximum sequence size of the local web server version?
@mmcguffi and @jeffreybarrick Thank you for your reply and suggestion. Some of the de novo assembly plasmids are larger than 50,000 bp, I just separate the sequence into two files for annotation. Or whether the limitation of plasmid size could be increased to 100 kbp in web server? Thank you.
Amber
@HJTsai I removed the plasmid size restriction for the command line tool, though this update is not yet pushed to bioconda
-- you will have to git clone
the repository to get these changes for now.
Unfortunately due to computational limits we cannot increase the size for the public web server. If you have any issues with installation or usage of the command line tool, please feel free to reach out.
@HJTsai I removed the plasmid size restriction for the command line tool, though this update is not yet pushed to
bioconda
-- you will have togit clone
the repository to get these changes for now.
Thank you. It's work.
@mmcguffi I found that the plasmid size restriction was removed, but the annotation above 50 kb was not reported in csv and html files. Could you please help me confirm the above information? Thanks.
@HJTsai I don't see this issue on my end -- for instance when using the gbk
downloaded from here and running this command: plannotate batch -i ./addgene-plasmid-70261-sequence-348706.gbk -o ~/Desktop/ -h -c
, I get a gbk, html, and csv file:
Were you able to download plannotate from a git clone
and install manually?
I download plannotate from a git clone and install manually.
The following pictures are annotated by web server.
The de novo assembly plasmid is larger than 50,000 bp, I just separate the sequence into two files for annotation.
The following picture is annotated by command.
plannotate batch -i bc03_m2_c35_homopolish/consensus_homopolished.fasta --html -c -f bc03_m2_c35
The annotation above 50 kb (repR, yefC, NMFIC_NEIMB and tnp1) was not reported in csv and html files.
@HJTsai can you send me your file/sequence so I can take a closer look and try to debug?
I notice that the 2nd plasmid fragment you show (~21,000 bps) is actually a subset of the first plasmid fragment (~42,500 bp) -- between ~8,500 bp and ~30,000 bp.
Also, I notice that these appear to be all hits from SwissProt -- this indicates to me this is either a natural plasmid, or at least a very recently "domesticated" natural plasmid. You might want to take a look at Prokka: https://github.com/tseemann/prokka
I have a feeling that Prokka will do a better job annotating your plasmid, though if you send me your plasmid sequence I would be happy to take a closer look at what is going wrong here
The files are attached. Thanks.
Amber
寄件者: mmcguffi @.> 寄件日期: 2022年4月20日 下午 10:56 收件者: barricklab/pLannotate @.> 副本: HJTsai @.>; Mention @.> 主旨: Re: [barricklab/pLannotate] Entry size is too large -- must be 50000 bases or less. (Issue #14)
@HJTsaihttps://github.com/HJTsai can you send me your file/sequence so I can take a closer look and try to debug?
I notice that the 2nd plasmid you show (~21,000 bps) is actually a subset of the first plasmid (~42,500 bp) -- between ~8,500 bp and ~30,000 bp.
Also, I notice that these appear to be all hits from SwissProt -- this indicates to me this is either a natural plasmid, or at least a very recently "domesticated" natural plasmid. You might want to take a look at Prokka: https://github.com/tseemann/prokka
I have a feeling that Prokka will do a better job annotating your plasmid, though if you send me your plasmid sequence I would be happy to take a closer look at what is going wrong here
— Reply to this email directly, view it on GitHubhttps://github.com/barricklab/pLannotate/issues/14#issuecomment-1104031194, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL5HUJCFBJU7V5KUX2HA2GDVGALJVANCNFSM5S6JSVPA. You are receiving this because you were mentioned.Message ID: @.***>
@HJTsai I think something went wrong -- I dont see any files attached here or via email
I try again. The files are attached. Thanks.
Amber
寄件者: mmcguffi @.> 寄件日期: 2022年4月21日 上午 08:49 收件者: barricklab/pLannotate @.> 副本: HJTsai @.>; Mention @.> 主旨: Re: [barricklab/pLannotate] Entry size is too large -- must be 50000 bases or less. (Issue #14)
@HJTsaihttps://github.com/HJTsai I think something went wrong -- I dont see any files attached here or via email
— Reply to this email directly, view it on GitHubhttps://github.com/barricklab/pLannotate/issues/14#issuecomment-1104590196, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL5HUJBQPFBEQMFSEKPGRDLVGCQ2NANCNFSM5S6JSVPA. You are receiving this because you were mentioned.Message ID: @.***>
@HJTsai unfortunately there are still no files attached. You can drag and drop a file into the this Github thread to attach a file. Attaching/sending by replaying to this thread by email doesn't seem to work
@mmcguffi txt files are not attached. I transfer to word files. bc03_c35.docx bc03_c35_1.docx bc03_c35_2.docx Thanks.
@HJTsai, sorry for the delay in this, but I think I finally have a solution to your problem. If you save the follow code below as fix.yaml
or something similar, you point plannotate to this file when running in batch
mode.
e.g.: plannotate batch -i my_plasmid.fa -y ~/my_path/fix.yaml
Rfam:
details:
compressed: false
default_type: ncRNA
location: None
location: Default
method: infernal
priority: 3
version: release 14.5
fpbase:
details:
compressed: false
default_type: CDS
location: Default
location: Default
method: diamond
parameters:
- -k 0
- --min-orf 1
- --matrix BLOSUM90
- --gapopen 10
- --gapextend 1
- --algo ctg
- --id 75
- --max-hsps 10
- --culling-overlap 200
- --seed-cut .001
- --comp-based-stats 0
priority: 1
version: downloaded 2020-09-02
snapgene:
details:
compressed: false
default_type: None
location: Default
location: Default
method: blastn
parameters:
- -perc_identity 95
- -max_target_seqs 20000
- -culling_limit 25
- -word_size 12
priority: 1
version: Downloaded 2021-07-23
swissprot:
details:
compressed: true
default_type: CDS
location: Default
location: Default
method: diamond
parameters:
- -k 0
- --min-orf 1
- --matrix BLOSUM90
- --gapopen 10
- --gapextend 1
- --algo ctg
- --id 50
- --max-hsps 10
- --culling-overlap 200
- --seed-cut .001
- --comp-based-stats 0
priority: 2
version: Release 2021_03
If you are curious, the specific fix here is the addition of --max-hsps 10 --culling-overlap 200
to the diamond
query of the protein databases
In my hands at least, your plasmid annotated much better with prokka, so that may be something to check out if you haven't yet
Hi,
Whether the limitation of plasmid size could be removed in web server? Thank you
Amber