tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
843 stars 226 forks source link

loop runs into path error #484

Closed TseWue closed 4 years ago

TseWue commented 4 years ago

Hi Thorsten

I run into an issue with the log file in prokka. In the step "writing logfile to" the path is somehow duplicated and the logfile of my Prokka/Polished_genomes/D1007.fasta becomes Prokka/Polished_genomes/D1007.fasta/Polished_genomes/D1007.fasta.log which is then of course followed by a "can't open logfile" error (because that path does not exist) and the program stops.

Initially I had 3 variables in my bash loop ($infolder, $outfolder, $thefastas). I run into the same errors and so I am now at a stage where I cut out the $infolder and $outfolder and just directly loop over i in folder (bash):

for i in Polished_genomes/*.fasta; do singularity exec ../../Containers_WGS/Prokka1.14.5.sif \ prokka --outdir Prokka/$i --prefix $i --kingdom Bacteria --addgenes --gcode 11 --cpu 16 $i done

Yet I am getting the same type of error. Here the output:

Processing sample [10:00:57] This is prokka 1.14.5 [10:00:57] Written by Torsten Seemann torsten.seemann@gmail.com [10:00:57] Homepage is https://github.com/tseemann/prokka [10:00:57] Local time is Mon Apr 20 10:00:57 2020 [10:00:57] You are tswuethrich [10:00:57] Operating system is linux [10:00:57] You have BioPerl 1.007002 [10:00:57] System has 20 cores. [10:00:57] Will use maximum of 16 cores. [10:00:57] Annotating as >>> Bacteria <<< [10:00:57] Generating locus_tag from 'Polished_genomes/D1007.fasta' contents. [10:00:57] Setting --locustag MNFKFOMI from MD5 67f4f862109f3be7575a14bc1a98f2f7 [10:00:57] Creating new output folder: Prokka/Polished_genomes/D1007.fasta [10:00:57] Running: mkdir -p Prokka\/Polished_genomes\/D1007.fasta [10:00:57] Using filename prefix: Polished_genomes/D1007.fasta.XXX [10:00:57] Setting HMMER_NCPU=1 [10:00:57] Writing log to: Prokka/Polished_genomes/D1007.fasta/Polished_genomes/D1007.fasta.log [10:00:57] Can't open logfile

Why is the file name prefix a path? How am I supposed to run prokka on a range of files? I am sorry but I am a bit puzzled by why this should not be working.

Thanks for your help, I appreciate it! Tse

kapsakcj commented 4 years ago

Looks to be an issue with your for loop, not prokka. Here's one solution, though it's certainly not the most elegant way to approach this)

# If you have a subdirectory with 3 fasta's
$ tree
.
└── polishedgenomes
    ├── asdf.fasta
    ├── qwer.fasta
    └── xcvb.fasta

# This loop sets i = to the relative PATH
$ for i in polishedgenomes/*.fasta; do echo $i; done
polishedgenomes/asdf.fasta
polishedgenomes/qwer.fasta
polishedgenomes/xcvb.fasta

# you can adjust your for loop so that you get a new variable with just the name of your input genome
$ for i in polishedgenomes/*.fasta; do genomeName=$(basename $i | cut -d '.' -f 1); echo $genomeName; done
asdf
qwer
xcvb

# then run prokka, using $i as the location of the fasta file and $genomeName as the prefix
TseWue commented 4 years ago

Hi @kapsakcj

Thanks for the message! Yes actually we figured it out just shortly after I posted here. :-) I did not get that my $i also contained the path and that it needed to be cut in order to function as a prefix. Little language confusion there.

I hope it helps anyone with similar problems so here is the loop I ended up using:

OUTFOLDER=Analysis/Prokka`
INFOLDER=Analysis/Polished_genomes
mkdir $OUTFOLDER

for i in $INFOLDER/*.fasta; do
temp=${i##*/}
iR=${temp%.fasta}

prokka --outdir $OUTFOLDER/$iR --prefix $iR --cpus 16 --kingdom Bacteria --addgenes --gcode 11 --force $i

done

Stay safe!

tseemann commented 4 years ago

Thanks for helping @kapsakcj

P=/home/tse/dir/file.txt
D=$(dirname $P)         # /home/tse/dir
F=$(basename $P)        # file.txt
X=$(basename $P .txt)   # file