Closed TseWue closed 4 years ago
Looks to be an issue with your for loop, not prokka. Here's one solution, though it's certainly not the most elegant way to approach this)
# If you have a subdirectory with 3 fasta's
$ tree
.
└── polishedgenomes
├── asdf.fasta
├── qwer.fasta
└── xcvb.fasta
# This loop sets i = to the relative PATH
$ for i in polishedgenomes/*.fasta; do echo $i; done
polishedgenomes/asdf.fasta
polishedgenomes/qwer.fasta
polishedgenomes/xcvb.fasta
# you can adjust your for loop so that you get a new variable with just the name of your input genome
$ for i in polishedgenomes/*.fasta; do genomeName=$(basename $i | cut -d '.' -f 1); echo $genomeName; done
asdf
qwer
xcvb
# then run prokka, using $i as the location of the fasta file and $genomeName as the prefix
Hi @kapsakcj
Thanks for the message! Yes actually we figured it out just shortly after I posted here. :-) I did not get that my $i also contained the path and that it needed to be cut in order to function as a prefix. Little language confusion there.
I hope it helps anyone with similar problems so here is the loop I ended up using:
OUTFOLDER=Analysis/Prokka`
INFOLDER=Analysis/Polished_genomes
mkdir $OUTFOLDER
for i in $INFOLDER/*.fasta; do
temp=${i##*/}
iR=${temp%.fasta}
prokka --outdir $OUTFOLDER/$iR --prefix $iR --cpus 16 --kingdom Bacteria --addgenes --gcode 11 --force $i
done
Stay safe!
Thanks for helping @kapsakcj
P=/home/tse/dir/file.txt
D=$(dirname $P) # /home/tse/dir
F=$(basename $P) # file.txt
X=$(basename $P .txt) # file
Hi Thorsten
I run into an issue with the log file in prokka. In the step "writing logfile to" the path is somehow duplicated and the logfile of my Prokka/Polished_genomes/D1007.fasta becomes Prokka/Polished_genomes/D1007.fasta/Polished_genomes/D1007.fasta.log which is then of course followed by a "can't open logfile" error (because that path does not exist) and the program stops.
Initially I had 3 variables in my bash loop ($infolder, $outfolder, $thefastas). I run into the same errors and so I am now at a stage where I cut out the $infolder and $outfolder and just directly loop over i in folder (bash):
for i in Polished_genomes/*.fasta; do singularity exec ../../Containers_WGS/Prokka1.14.5.sif \ prokka --outdir Prokka/$i --prefix $i --kingdom Bacteria --addgenes --gcode 11 --cpu 16 $i done
Yet I am getting the same type of error. Here the output:
Processing sample [10:00:57] This is prokka 1.14.5 [10:00:57] Written by Torsten Seemann torsten.seemann@gmail.com [10:00:57] Homepage is https://github.com/tseemann/prokka [10:00:57] Local time is Mon Apr 20 10:00:57 2020 [10:00:57] You are tswuethrich [10:00:57] Operating system is linux [10:00:57] You have BioPerl 1.007002 [10:00:57] System has 20 cores. [10:00:57] Will use maximum of 16 cores. [10:00:57] Annotating as >>> Bacteria <<< [10:00:57] Generating locus_tag from 'Polished_genomes/D1007.fasta' contents. [10:00:57] Setting --locustag MNFKFOMI from MD5 67f4f862109f3be7575a14bc1a98f2f7 [10:00:57] Creating new output folder: Prokka/Polished_genomes/D1007.fasta [10:00:57] Running: mkdir -p Prokka\/Polished_genomes\/D1007.fasta [10:00:57] Using filename prefix: Polished_genomes/D1007.fasta.XXX [10:00:57] Setting HMMER_NCPU=1 [10:00:57] Writing log to: Prokka/Polished_genomes/D1007.fasta/Polished_genomes/D1007.fasta.log [10:00:57] Can't open logfile
Why is the file name prefix a path? How am I supposed to run prokka on a range of files? I am sorry but I am a bit puzzled by why this should not be working.
Thanks for your help, I appreciate it! Tse