Closed YiweiZhu closed 5 years ago
Hello @YiweiZhu !
Please make sure your files reside under /data3/zyw/miniconda3/db/
, i. e.: Pseudomonas.p[hr|in|sq]
. Your DB seems to sit under /data3/zyw/gbk/
. The clustering step in the Readme may not be required with a single genome. It is employed if you want to "join" several genomes of the same Genus but want to avoid redundancies by homologous proteins. If you apply CD-HIT to a single genome you probably filter for paralogs (which are homologs in a wider sense but evolved through gene duplication rather than speciation (=orthologs).
@ealdraed why is bioconda putting db
into the root folder of miniconda?
@YiweiZhu the simplest thing to do is to just use
prokka --proteins Pseudomonas_aeruginosa_12-4-4_59__3618.gbk ...
I build a Pseudomonas database according to the codes on the website. After cd-hit completes, I don't get .bak.clstr file. And after makebalstdb completes, I only get three .p files, that is .phr, .pin, . psq. The --listdb doesn't show my database, even if I move these files into db filefold. Which step is wrong?
$ prokka-genbank_to_fasta_db Pseudomonas_aeruginosa_12-4-4_59__3618.gbk > Pseudomonas.faa Will use first of (protein_id locus_tag db_xref) as FASTA ID Parsing: NZ_CP013696 Done.
$ cd-hit -i Pseudomonas.faa -o Pseudomonas -T 0 -M 0 -g 1 -s 0.8 -c 0.9 Program: CD-HIT, V4.8.1, Mar 01 2019, 14:14:47 Command: cd-hit -i Pseudomonas.faa -o Pseudomonas -T 0 -M 0 -g 1 -s 0.8 -c 0.9 Started: Tue Mar 19 15:41:19 2019 Output
Option -T is ignored: multi-threading with OpenMP is NOT enabled! total seq: 3939 longest and shortest : 4991 and 23 Total letters: 1443332 Sequences have been sorted Approximated minimal memory consumption: Sequence : 1M Buffer : 1 X 11M = 11M Table : 1 X 65M = 65M Miscellaneous : 0M Total : 79M Table limit with the given memory limit: Max number of representatives: 4000000 Max number of word counting entries: 265995500 comparing sequences from 0 to 3939 ... 3939 finished 3917 clusters Approximated maximum memory consumption: 90M writing new database writing clustering information program completed ! Total CPU time 0.88
$ makeblastdb -dbtype prot -in Pseudomonas Building a new DB, current time: 03/19/2019 15:42:31 New DB name: /data3/zyw/gbk/Pseudomonas New DB title: Pseudomonas Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 3917 sequences in 0.204363 seconds. $ prokka --listdb [15:49:25] Looking for databases in: /data3/zyw/miniconda3/bin/../db [15:49:25] Kingdoms: Archaea Bacteria Mitochondria Viruses [15:49:25] Genera: Enterococcus Escherichia Staphylococcus [15:49:25] HMMs: HAMAP [15:49:25] CMs: Bacteria Viruses