Open oclaisse opened 1 month ago
It sounds like there are proteins in your FAA file that do not have matching records in your GFF file, if you are able to attach those files in a comment here or email them to me then I can take a look at what entries are causing the issue (this is usually an easy fix).
This is related to #8 (jump to relevant comment),
"...any genes with the
pseudo=True
attribute get theirID
s derived from theName
attribute—which overwrites your otherwise correctID
s here with whatever was inName
."
In your case, the entries causing the issue are those with attribute ID=
AMLJAP_02445
, AMLJAP_02450
, and AMLJAP_08650
.
You can use the same fix I posted in the thread above, by patching your version of padloc
to remove the problematic code:
# Download patch
wget -O padloc.patch "https://github.com/padlocbio/padloc/files/13629886/padloc.patch"
# Find padloc script
padloc_src=$(which padloc.R)
# Apply patch
patch -u -b "${padloc_src}" padloc.patch
This saves a backup of the original code to ${padloc_src}.orig
, so if you want to restore the original code later on just overwrite the patch:
mv "${padloc_src}.orig" "${padloc_src}"
Hello, I want to use padloc-2.0.0 with the fna option it works well but when I want to to use it with annotated proteins with the --faa and --gff options I have this issue ERROR >> 3 protein sequence IDs are missing from GFF file Exécution arrêtée [16:15:36] ERROR >> errexit on line 425 I have tried with the prodigal outputs from the fna option and also with files from bakta annotation without the sequence in the gff file but it the same Could you please help me to solve this? Best regards Olivier