mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
108 stars 26 forks source link

Annotating unitigs using annotate_hits_pyseer #272

Closed daisy238 closed 2 weeks ago

daisy238 commented 1 month ago

Hello, I'm trying to run annotate_hits_pyseer using my unitig based hits. I can see my gff3 files have the following feature types so I'm not sure why the below command isn't working. I'm using Pyseer v1.3.11.


annotate_hits_pyseer UNITIGS_output_significant_hits.tsv references.txt --feature-type oriC --feature-type oriV --feature-type oriT --feature-type CDS --feature-type tRNA --feature-type tmRNA --feature-type rRNA --feature-type ncRNA --feature-type regulatory_region --feature-type CRISPR --feature-type pseudogene UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv

usage: annotate_hits [-h] [--bwa BWA] [--tmp-prefix TMP_PREFIX] kmers references output
annotate_hits: error: unrecognized arguments: --feature-type --feature-type oriV --feature-type oriT --feature-type CDS --feature-type tRNA --feature-type tmRNA --feature-type rRNA --feature-type ncRNA --feature-type regulatory_region --feature-type CRISPR --feature-type pseudogene UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv

The above command works fine when I don't use the --feature-type parameters but then it only uses the default CDS feature types to annotate the unitigs. Do you have any advice please?

mgalardini commented 1 month ago

I think you are getting the error because you have omitted to indicate the feature type after the first use of --feature-type, based on the error message: annotate_hits: error: unrecognized arguments: --feature-type --feature-type oriV [...].

I tried the original command and it worked (i.e. it crashed because I don't have your input files, but that doesn't matter). One note is that the feature type is the third column of the GFF file, and if you annotated your genomes with something like prokka that column would not have the gene name, "only" CDS/tRNA, etc...

Hope this helps

daisy238 commented 1 month ago

Sorry for the above typo, regarding the feature type. Even if I only use one of the feature types to test the command, I still get the same error. I've just tried the below:

annotate_hits_pyseer UNITIGS_output_significant_hits.tsv --feature-type rRNA references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv

I'm using Bakta annotated gff3 files

mgalardini commented 1 month ago

Can you perhaps explain what exactly is not working? And could you share an excerpt from your GFF file?

daisy238 commented 1 month ago

I get the same error as my first message (error below). The command only works when --feature-type is not used.

annotate_hits_pyseer UNITIGS_output_significant_hits.tsv --feature-type regulatory_region references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv

usage: annotate_hits [-h] [--bwa BWA] [--tmp-prefix TMP_PREFIX] kmers references output
annotate_hits: error: unrecognized arguments: --feature-type UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv

I've added a snippet of one of the gff3 files, showing that feature-type "regulatory_region" is present in the third column: gff3_extract.txt

mgalardini commented 1 month ago

Oh I see what the problem is: you should put the --feature-type regulatory_region argument at the end of the command or before the three positional arguments:

annotate_hits_pyseer UNITIGS_output_significant_hits.tsv references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv --feature-type regulatory_region

or:

annotate_hits_pyseer UNITIGS_output_significant_hits.tsv references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv --feature-type regulatory_region

Either of these should work

daisy238 commented 1 month ago

I just tried the methods you suggested but neither seems to work. I still have the same error. I've tried moving --feature-type regulatory_region to different places but the script doesn't seem to recognise the parameter --feature-type at all:

annotate_hits_pyseer UNITIGS_output_significant_hits.tsv references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv --feature-type regulatory_region
usage: annotate_hits [-h] [--bwa BWA] [--tmp-prefix TMP_PREFIX] kmers references output
annotate_hits: error: unrecognized arguments: --feature-type regulatory_region

annotate_hits_pyseer regulatory_region UNITIGS_output_significant_hits.tsv references.txt UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv
usage: annotate_hits [-h] [--bwa BWA] [--tmp-prefix TMP_PREFIX] kmers references output
annotate_hits: error: unrecognized arguments: UNITIGS_output_significant_hits_ANNOTATED_TEST.tsv
daisy238 commented 1 month ago

Hi @mgalardini, any chance you've managed to figure out what the problem may be?

daisy238 commented 1 month ago

Think I've sorted the problem out. The most recent version of Pyseer v1.3.11 installed through Conda, doesn't have the up to date version of annotate_hits.py with the --feature-type added as an argument.

For those having similar problems, annotate_hits.py in your pyseer directory needs to be updated. For my Conda environment I'm using python 3.10 so annotate_hits.py is installed here: User/pyseer/lib/python3.10/site-packages/pyseer/kmer_mapping/annotate_hits.py

I then updated the script with the code from here: https://raw.githubusercontent.com/mgalardini/pyseer/45b27a6b54ccde65b874058aecb69a571319f3e4/pyseer/kmer_mapping/annotate_hits.py

mgalardini commented 2 weeks ago

Ah I see! Thanks for figuring this out, the release really misses the argument for some reason. I'll put out a new version ASAP

mgalardini commented 2 weeks ago

Oh, I know why the latest release does not have the argument: it was introduced after that last release. I guess this is a good time as any to push a new release

mgalardini commented 2 weeks ago

Ok, I just published a new release, which should get into bioconda in a day or two