weberlab-hhu / GeenuFF

Schema and API for a relational db that encodes gene models in an explicit, structured, and robust fashion.
GNU General Public License v3.0
6 stars 8 forks source link

Still need to implement backward match if fasta IDs are subset of gff IDs #397

Open hjjansen opened 1 year ago

hjjansen commented 1 year ago

Thank you for providing the means to train our own models. This will be a useful resource as more and more genomes are being sequenced. I read the disclaimer but still wanted to train a model to annotate a specific group of animal genomes (parasitiformes). There are a number of genomes and annotations available in NCBI so I downloaded those and then used this command python import2geenuff.py --fasta GCF_002443255.2_Vdes_3.0_genomic.fna --gff3 GCF_002443255.2_Vdes_3.0_genomic.gff --db-path Varroa_destructor.sqlite3 --log-file Varroa_destructor.import.log --species Varroa_destructor to generate the sqlite3 files. It read the fasta file but errors on reading the gff file:

19-Apr-23 14:39:02 - INFO: Starting to parse the GFF file
Aborting due to error, attempt so far saved at Varroa_destructor.sqlite3.partial for debugging purposes
Traceback (most recent call last):
  File "/home/nextgen/GeenuFF/GeenuFF/import2geenuff.py", line 122, in <module> main(args)
  File "/home/nextgen/GeenuFF/GeenuFF/import2geenuff.py", line 95, in main
    controller.add_genome(paths.fasta_in, paths.gff_in, genome_args)
  File "/home/nextgen/GeenuFF/GeenuFF/geenuff/applications/importer.py", line 892, in add_genome
     raise e
  File "/home/nextgen/GeenuFF/GeenuFF/geenuff/applications/importer.py", line 884, in add_genome
    self.add_gff(gff_path, clean=clean_gff)
  File "/home/nextgen/GeenuFF/GeenuFF/geenuff/applications/importer.py", line 959, in add_gff
    self.latest_fasta_importer.mk_mapper(gff_file)
  File "/home/nextgen/GeenuFF/GeenuFF/geenuff/applications/importer.py", line 1019, in mk_mapper
    raise NotImplementedError("Still need to implement backward match if fasta IDs "
NotImplementedError: Still need to implement backward match if fasta IDs are subset of gff IDs

Indeed the mk_mapper function in importer.py seems to be missing code to handel this situation. Is this caused by something in the gff file that GeenuFF can't yet handle? Is it possible to elaborate a little bit more on this error? If so we can try to help writing something that handles this.

calliope-pro commented 1 year ago

I also face that error.

calliope-pro commented 1 year ago

I found that this program works when sqlalchemy is version 1.4.

hjjansen commented 1 year ago

I found that this program works when sqlalchemy is version 1.4.

My version is pip show sqlalchemy Output: Name: SQLAlchemy Version: 1.4.0 Summary: Database Abstraction Library Home-page: http://www.sqlalchemy.org Author: Mike Bayer Author-email: mike_mp@zzzcomputing.com License: MIT Location: /home/nextgen/.local/lib/python3.8/site-packages Requires: greenlet Required-by: geenuff And the error remains.

arslan9732 commented 9 months ago

This issue can be resolved by installing sqlalchemy-1.4.49.