Closed hoelzer closed 3 years ago
This issue should be fixed now. I used the script from emg-viral-pipeline for renaming the contig IDs before the prokka annotation step and wrote a bash script for mapping the original contig IDs back right after. The processes are called rename.nf and restore.nf, the corresponding scripts lie in the bin/ dir.
The prokka parameter is now also part of the nextflow configuration.
great! And you are renaming all the Prokka output files? (fasta, gff, ...)? So that the final prokka output files match the original input contig IDs?
Yes, all prokka output files are scanned for the renamed contig IDs and restored via the mapping file.
ok great!
example FASTA: SRR10176980_polished.fasta.gz
Command:
Error:
@EvaFriederike I suggest you try something like:
But then we should also think of a way to re-rename the contigs after the annotation. Because the user might want to see his original contig IDs.
We could also think, as an alternative, to always rename the FASTA IDs in the first step, e.g. like done here:
https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow/modules/rename.nf
https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow/modules/restore.nf
Here, a
python
script is used for the renaming that also stores a map to later restore the original IDs in the FASTA.The
python
script lives in abin
folder:https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/rename_fasta.py
from where Nextflow can automatically access it.
In addition, we can add to the
nextflow.config
a parameter to control parameters for prokka:per default the param is empty, but e.g. its important when someone wants to run a bacteria genome that does not follow the standard gene code (e.g. Mycoplasma bovis). Then he/she can use:
or so