weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
165 stars 28 forks source link

sqlite3.DataError: string or blob too big #132

Open arslan9732 opened 5 months ago

arslan9732 commented 5 months ago

I am trying to run fine_tuning for a new plant. But during the conversion of the gff3 output by HelixerPost to Helixer's training data format I got this error:

Traceback (most recent call last):
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlite3.DataError: string or blob too big

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/data/arslan/tool/GeenuFF/import2geenuff.py", line 120, in <module>
    main(args)
  File "/mnt/data/arslan/tool/GeenuFF/import2geenuff.py", line 93, in main
    controller.add_genome(paths.fasta_in, paths.gff_in, genome_args)
  File "/mnt/data/arslan/tool/GeenuFF/geenuff/applications/importer.py", line 875, in add_genome
    self.add_sequences(fasta_path, genome_args)
  File "/mnt/data/arslan/tool/GeenuFF/geenuff/applications/importer.py", line 894, in add_sequences
    self.session.commit()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1454, in commit
    self._transaction.commit(_to_root=self.future)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 832, in commit
    self._prepare_impl()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 811, in _prepare_impl
    self.session.flush()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
    self._flush(objects)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3588, in _flush
    with util.safe_reraise():
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
    flush_context.execute()
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1238, in _emit_insert_statements
    result = connection._execute_20(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/mnt/data/arslan/tool/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (sqlite3.DataError) string or blob too big
[SQL: INSERT INTO coordinate (sequence, length, seqid, sha1, genome_id) VALUES (?, ?, ?, ?, ?)]
[parameters: ('CCCACTTGCAACCAAACACGGGCACTTGAAAGCATGAGTAATCCAATTCCCAAATACGTTCAATGACCCCAAAATATGACAATTTGGAAAATGCGGGATTTCTATTTTTGGAACTTGAGATATGCACAGATTCAGCTACGAGTGTGACA ... (1853204065 characters truncated) ... CCAAGGCACTAGATGAATTGGAAATATCAAGAATATTCATGTGAAAATCATGAATACACTCATCACCCTTCATCCCGAGATTCCCAAATTTGGTGGTGAGAATTTGAAGTCTTGACATTTTTAATTTTGATTTCCCTTCATGAGTGGTT', 1853204363, 'chr1L', '55cf8a4f2868b7127b10c94200d1c8e29516f0db', 1)]
(Background on this error at: https://sqlalche.me/e/14/9h9h)

Here is the command that I used:

python GeenuFF/import2geenuff.py --fasta genome.fa --gff3 genome.hlx.gff \
  --db-path Vfaba.sqlite3 --log-file my_genome_import.log \
  --species my_genome
alisandra commented 4 months ago

Ah, do you have a single chromosome that is longer than $2^{31}-1$ i.e. 2147483647?

If so, this number is unfortunately a limitation of our current implementation.

alisandra commented 4 months ago

@arslan9732 As a work around and only for the sake of fine tuning, you could split or truncate any chromosomes longer than the above numbers, but run the final inference (once you have the tuned model) on the original full-length sequences.