Open widdowquinn opened 2 years ago
Can you share any of these discussions with users? That would give me a more concrete idea of the problem this solves. I imagine it's the same issue I had between the fastANI
branch and others, though.
You're the user with the most experience of these incompatibilities, for sure ;)
We have talked about using alembic
for this before, and it's good practice in any case.
I don't have much concrete, but here's a related screenshot as I helped Angelika understand her issue with the update:
I've been playing with this, and there are a few things I don't seem to have figured out, yet.
alembic upgrade head
multidb
?BUT, I think I have a basic migration script written for the old/new versions of the database (pre-/post- fastANI addition).
I have set up the basic requirements for a new versiondb
subcommand, and also managed to get alembic
working correctly locally. I'm now working on tying the functionality into pyani
in a sensible way.
I was going to add a tag in the repo, prior to the merge of fastANI
, to capture the database changes, but it looks like tags are directly related to releases, and doing so might be confusing, since the database is only a factor in version 3, and all of the existing releases are version 2.
Actually, I have found an issue with the downgrade()
function I am testing, which should remove the columns that are added to support fastANI
.
The function I am running is written in accordance with examples I've seen:
def downgrade_old_db():
with op.batch_alter_table("comparisons") as batch_op:
batch_op.drop_column('kmersize')
batch_op.drop_column('minmatch')
but nothing happens—note the lack of a 'Running downgrade ...' message here:
(pyani_dev) ! alembic downgrade base
INFO [alembic.env] Migrating database old_db
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Many sources on the internet claim that this is because sqlite
does not support the DROP COLUMN
functionality of ALTER TABLE
, but the documentation for this indicates otherwise. It appears this change was made sometime prior to July 2021, though I have not been able to pinpoint exactly when.
I have, however, confirmed that I can drop columns within my sqlite
installation, so I'm puzzled as to why alembic
is not managing to do so.
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, kmersize INTEGER, minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sqlite> alter table comparisons drop kmersize;
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sqlite>
it looks like tags are directly related to releases, and doing so might be confusing, since the database is only a factor in version 3, and all of the existing releases are version 2.
IIRC, git
tags are just labels (though really, only lightweight
labels are just labels - annotated
tags are proper git
objects; both point to specific commits) - but GitHub chooses to interpret tags in a specific way. Each tag is understood to be a release point, but the release point only becomes an actual release if Release Notes
are added. GitHub's approach to this is not especially general, IMO.
Actually, I have found an issue with the
downgrade()
function I am testing, which should remove the columns that are added to supportfastANI
.The function I am running is written in accordance with examples I've seen:
def downgrade_old_db(): with op.batch_alter_table("comparisons") as batch_op: batch_op.drop_column('kmersize') batch_op.drop_column('minmatch')
but nothing happens—note the lack of a 'Running downgrade ...' message here:
(pyani_dev) ! alembic downgrade base INFO [alembic.env] Migrating database old_db INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Does alembic provide no error message or other useful information (and appear to have "worked" without DROP
ping the column)?
A quick search turned up at least one workaround someone implemented, but we shouldn't have to do that.
I've seen two mentions of a render_as_batch
option which, I think, allows alembic
to batch copy the table, modify it accordingly, DROP
the old one, and rename the new one:
Miguel Grinberg has a blog post outlining a possible solution.
I expect you'll have seen and tried all of these, mind.
Does alembic provide no error message or other useful information (and appear to have "worked" without DROPping the column)?
By default it does not produce any error message, and I have tried including the --raiseerr
optional flag to get a full traceback, but it still doesn't seem to output anything.
I was attempting to verify that the issue was with drop table, and not just downgrading specifically, by creating a second revision so I had more things to step through. However, now I am stuck in a loop where alembic
claims the database is not up-to-date, but the fastani
columns exist. The issue seems to be it's not populating the alembic_version
table now, so the upgrade fails partway through, but I don't know why. I think I saw something about this the other day, but haven't managed to find it again.
I think this issue started when I manually deleted columns, to prove I could, though I've since recreated the database. Perhaps something is just not hooked up quite right....
I seem to have hacked my way through a series of arcane errors I could find little information about to something that works.
I now have two migration scripts, that I am able to use to both upgrade and downgrade a database reliably. The differences from what I had before are that I did not use the multidb
template this time, as it did not seem necessary after further reflection, and I have had to switch from performing the batch operations within a context:
with op.batch_alter_table("comparisons") as batch_op:
batch_op.add_column(sa.Column('kmersize', sa.Integer))
batch_op.add_column(sa.Column('minmatch', sa.Float))
to doing so outside of one:
op.add_column('comparisons', sa.Column('kmersize', sa.Integer))
op.add_column('comparisons', sa.Column('minmatch', sa.Float))
This change was necessary because I got the following error, and nothing else both prevents the error, and still performs the changes. I have not been able to figure out why this caused an error when, for instance, I successfully used the former format yesterday.
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/bin/alembic", line 8, in <module>
sys.exit(main())
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 588, in main
CommandLine(prog=prog).main(argv=argv)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 582, in main
self.run_cmd(cfg, options)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 559, in run_cmd
fn(
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/command.py", line 320, in upgrade
script.run_env()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/script/base.py", line 563, in run_env
util.load_python_file(self.dir, "env.py")
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
module = load_module_py(module_id, path)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/baileythegreen/Software/pyani/scratch/alembic/env.py", line 72, in <module>
run_migrations_online()
File "/Users/baileythegreen/Software/pyani/scratch/alembic/env.py", line 66, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
self.get_context().run_migrations(**kw)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
step.migration_fn(**kw)
File "/Users/baileythegreen/Software/pyani/scratch/alembic/versions/65538af5a5e1_add_nonsense_column.py", line 21, in upgrade
batch_op.add_column(sa.Column('nonsense', sa.Integer))
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/base.py", line 374, in batch_alter_table
impl.flush()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/batch.py", line 101, in flush
should_recreate = self._should_recreate()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/batch.py", line 94, in _should_recreate
return self.operations.impl.requires_recreate_in_batch(self)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/ddl/sqlite.py", line 56, in requires_recreate_in_batch
and col.server_default.persisted
AttributeError: 'NoneType' object has no attribute 'persisted'
I have no idea how this manages to get around the issue I saw before when nothing seemed to happen on downgrade.
Does this seem like the right way to incorporate alembic
with pyani
? I know less abut the nuance of packaging things in Python.
https://github.com/sqlalchemy/alembic/issues/463#issuecomment-441887412
I'm not familiar enough with alembic
to know whether it's the "correct" way to proceed. It looks to have been part of alembic
since 1.2.0, but isn't explicitly documented or in the tutoral AFAICT. So I presume it's stable, but maybe a niche operation that doesn't get stress-tested a lot.
What I get from the thread linked above seems sensible. If it does what we need, doesn't break anything else, and doesn't impose any outrageous requirements, that's good enough for me.
I meant more from a Python packaging standpoint. All this seems to do is create two empty __init__.py
files—one in the alembic
(or other) directory specified as the location for alembic
's stuff; the other in the versions
subdirectory of the former. This seems unlikely to break anything, but may or may not be necessary.
I think the files that need to be included/given to users, and the correct structure for them is:
- alembic.ini
- alembic/
- __init__.py
- env.py
- README # contains a single line stating what alembic template was used; may not be useful
- script.py.mako # this may not be necessary; it is a template for creating new migration files
- versions/
- __init__.py
- <revision_file>
- <revision_file>
I have a subcommand working that will run upgrades and downgrades, and capture the output/errors from those. The main questions now are:
How should these files be included/given to the user?
conda
, I don't know if something else would be better.Is there a way to set the sqlalchemy.url
variable programmatically?
alembic.ini
file in order for things to work, but it needs to match whatever database the user is using; for distribution we could use the default database name, but I would like to find a way to pull it from the command line arguments, or something. Ideally, without using sed
, or a vimscript, to edit it. All this seems to do is create two empty
__init__.py
files—one in thealembic
(or other) directory specified as the location foralembic
's stuff; the other in theversions
subdirectory of the former.
That doesn't seem unreasonable but, as I understand it, the __init__.py
may no longer be necessary with namespace packages. As pyani
isn't supported for Python versions that don't support namespace packaging, we'd likely be OK without it.
How should these files be included/given to the user? Including them at the top level seems sensible to me, but for users who install via
conda
, I don't know if something else would be better.
I imagined that there would be a way to present only the pyani upgrade
/pyani downgrade
command to the user and have it work. Is that not possible?
Is there a way to set the
sqlalchemy.url
variable programmatically?
I don't really know what this refers to, or should point to. I would have assumed the up/downgrade could run with a default --dbpath
argument like the other commands, or take a passed path.
Is this issue relevant? https://github.com/sqlalchemy/alembic/issues/606
For alembic
to work, it needs a config file: alembic.ini
, which contains the database name/driver info in the form of an sqlalchemy.url
variable, and the location of the migration scripts as another variable. The sqlalchemy.url
variable is pulled from the config file when alembic
's env.py
is run; I am not yet sure where the script location variable is retrieved. There does not seem to be a general way to pass these values inside an alembic
command so that I can just pull them from our own command line options. It is possible to give alembic
a database name this way, but I believe this only works in cases where the config file uses the multidb
template, and allows you to selectively upgrade/downgrade a single database.
I imagined that there would be a way to present only the
pyani upgrade
/pyani downgrade
command to the user and have it work. Is that not possible?
Right now, pyani versiondb [--upgrade <revision>]
and pyani versiondb --downgrade <revision>
work, but the location of the database, and the driver used, as well as the location of the migration scripts still have to be specified somewhere.
This is what I mean by specifying the driver & database location:
Valid SQLite URL forms are:
sqlite:///:memory: (or, sqlite://)
sqlite:///relative/path/to/file.db
sqlite:////absolute/path/to/file.db
The sqlite
part should be fine, but the path is obviously something we can't predict, and the only way I currently know to get the value of something like --dbpath
to the right bit of code is through an environment variable, which I would have to set, because I don't think alembic
's env.py
will have access to our namespace variables.
Is this issue relevant? sqlalchemy/alembic#606
This looks like they're using an environmental variable, so maybe. If you think that's an acceptable way forward. I still need to do some digging to see where script location stuff is used.
Examples of my current alembic.ini
and env.py
. The only modifications I have made are to the sqlalchemy.url
line(s) in the first one.
alembic.ini
:
# A generic, single database configuration.
[alembic]
# path to migration scripts
script_location = alembic
# template used to generate migration files
# file_template = %%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =
# max length of characters to apply to the
# "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator"
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions
# version path separator; As mentioned above, this is the character used to split
# version_locations. Valid values are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os # default: use os.pathsep
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# sqlalchemy.url = sqlite:////Users/baileythegreen/Software/pyani/scratch/alembic/old_db
sqlalchemy.url = sqlite:///pyani/scratch/alembic/old_db
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
and env.py
:
from logging.config import fileConfig
from sqlalchemy import engine_from_config
from sqlalchemy import pool
from alembic import context
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = None
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline():
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online():
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(
connection=connection, target_metadata=target_metadata
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
This looks like they're using an environmental variable, so maybe. If you think that's an acceptable way forward. I still need to do some digging to see where script location stuff is used.
Environmental variables are fair game. os.environ["PYANI_DB_PATH"]
could be set pretty straightforwardly post-CLI parsing, I think.
When populating, pathlib
's Path
should allow identification of the absolute path of whatever the CLI picks up for --dbpath
on the current system so that you don't need to worry about relative paths in the sqlalchemy
URL.
Okay, good to know. I'll see if I can get that working. I have been waiting to push stuff because technically right now it won't run; without those values, it's going to throw an error.
I'm looking forward to seeing how you manage the tests for this, too.
That's what I'll be looking at after I try this stuff with environmental variables. I have ideas for what I'll do with tests.
I have thus far failed to get the environment variable plan working. I have managed to set them, and I have the syntax for retrieving them correct. However, I am unable to get their values assigned to the alembic
config values, and I don't know why.
The env.py
file seems to be a black box; I can't get print()
statements to work, which makes it difficult to debug what's happening. If I don't have an alembic.ini
file present in the directory, I get an error related to that; if I have one, but comment out the lines setting the two variables I need to modify, I get a different error; and whatever I supply there is not overwritten when I reset it in env.py
.
An example of how I'm trying to reset them, where $DATABASE is the value of the --dbpath
variable.
dbpath = os.environ.get('DATABASE')
url = "sqlite:////Users/baileythegreen/Software/pyani/" + dbpath
config.set_main_option("sqlalchemy.url", url)
From what I understand, this is the correct API method for what I want to do (eliminate the .ini
file, but I have yet to get it to work.
It's hard to comment on the first part without seeing the error message or practically taking a look.
In the second part, this line:
url = "sqlite:////Users/baileythegreen/Software/pyani/" + dbpath
looks suspect, to me. However, it's not clear what's in the DATABASE
environment variable that gets read into dbpath
.
The required format, described in an earlier post, is essentially:
sqlite:////absolute/path/to/file.db
It's not clear to me that your string, concatenated with dbpath
, satisfies that. It looks like the hard-coded path to your pyani
directory might also cause issues.
Have you tried:
args.dbpath
from the CLI, get the absolute path with args.dbpath.resolve()
args.dbpath.resolve()
(and I'd suggest having a PYANI_
prefix so that related environment variables are effectively namespaced.url = f"sqlite:///{os.environ.get('PYANI_DBPATH'}"
and what was/is the result?
FEATURE REQUEST
alembic
does upgrades/downgrades in-place. It would be useful, I think, to avoid borking an existing database by these updates, so we could backup the database before we start. For instance:
pyanidb
could be copied to
pyanidb.YYYY-MM-DD_HH-MM-SS.bak
before the in-place modification starts.
To resummarise where this is (will potentially be expanded as I think of other things to add):
(Branch with code found here.)
env.py
not working have been resolved; the problem was that an incorrect env.py
was edited.pyani versiondb
command creates an environment variable that are used to set values for the database name, overriding the value specified in the alembic.ini
file. [^1] alembic
, rather than being user-determined. [^2]--package
option creates two __init__.py
files; one in the directory specified to alembic
(default alembic/
), one in the subdirectory alembic/versions/
. [^3] --name
, --config
) have been created to allow for specifying a specific database name (in cases of a multidb
alembic
setup) or and config file not in the default location (./alembic.ini
).[^1]: This prevents the need to predict/dictate the database name.
[^2]: This allows us to provide such a directory containing the necessary migration scripts as part of a pyani
install.
[^3]: I have decided not to use this option as it creates an issue for devs, wherein running import alembic
in the parent directory of a project that contains an alembic/__init__.py
file, attempts to import what it sees as a local module named alembic
, rather than the actual database version control package.
pyani versiondb
! pyani versiondb -h
usage: pyani versiondb [-h] [-l LOGFILE] [-v] [--debug] [--disable_tqdm] [--version] [--citation]
[--dbpath DBPATH]
(--upgrade [VERSION] | --downgrade [VERSION] | --dry-run [START:END])
[--alembic_exe ALEMBIC_EXE] [-n NAME] [-c FILE]
One of --upgrade, --downgrade, or --dry-run must be specified.
optional arguments:
-h, --help show this help message and exit
-l LOGFILE, --logfile LOGFILE
logfile location (default: None)
-v, --verbose report verbose progress to log (default: False)
--debug report debug messages to log (default: False)
--disable_tqdm Turn off tqdm progress bar (default: False)
--version
--citation
--dbpath DBPATH path to pyani database (default: .pyani/pyanidb)
--upgrade [VERSION] update an existing database to a newer schema; if no argument is given,
'head' will be used (default: None)
--downgrade [VERSION]
revert an existing database to a older schema; if no argument is given,
'base' will be used (default: None)
--dry-run [START:END]
produce the SQL that would be run in migrations, without altering the
database; start and end versions must be specified (default: None)
--alembic_exe ALEMBIC_EXE
path to alembic executable (default: alembic)
-n NAME, --name NAME used to specify an individual database in a multidb setup (default: None)
-c FILE, --config FILE
used to specify a config file for alembic (default: None)
I have chased down the issue underlying something weird I was seeing, which somewhat derailed my writing tests for pyani versiondb
.
If I took a copy of an old database (pre-fastANI
changes to the ORM), my code was able to upgrade/downgrade this without issue; but if I took a new one (post-fastANI
changes to the ORM), I get the following traceback:
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: error in table comparisons after drop column: no such column: kmersize
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/bin/alembic", line 8, in <module>
sys.exit(main())
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 588, in main
CommandLine(prog=prog).main(argv=argv)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 582, in main
self.run_cmd(cfg, options)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in run_cmd
fn(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/command.py", line 366, in downgrade
script.run_env()
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/script/base.py", line 563, in run_env
util.load_python_file(self.dir, "env.py")
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
module = load_module_py(module_id, path)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "alembic/env.py", line 104, in <module>
run_migrations_online()
File "alembic/env.py", line 79, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
self.get_context().run_migrations(**kw)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
step.migration_fn(**kw)
File "/Users/baileythegreen/Software/pyani/alembic/versions/92f7f6b1626e_add_fastani_columns.py", line 28, in downgrade
op.drop_column("comparisons", "kmersize")
File "<string>", line 8, in drop_column
File "<string>", line 3, in drop_column
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/ops.py", line 2189, in drop_column
return operations.invoke(op)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/base.py", line 392, in invoke
return fn(self, operation)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/toimpl.py", line 89, in drop_column
operations.impl.drop_column(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/ddl/impl.py", line 333, in drop_column
self._exec(base.DropColumn(table_name, column, schema=schema))
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/ddl/impl.py", line 197, in _exec
return conn.execute(construct, multiparams)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 80, in _execute_on_connection
return connection._execute_ddl(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1381, in _execute_ddl
ret = self._execute_context(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
self._handle_dbapi_exception(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
util.raise_(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) error in table comparisons after drop column: no such column: kmersize
[SQL: ALTER TABLE comparisons DROP COLUMN kmersize]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
This puzzled me, because the column is clearly present in the database.
Upon further inspection, it seems to be a difference in the unique constraints specified in the updated ORM, versus those achieved by my migration script to update the old database.
comparisons
table in old databasesqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
comparisons
table in old database following upgradesqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, kmersize INTEGER, minmatch FLOAT, nonsense INTEGER,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
comparisons
table in new databaseCREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN,
kmersize INTEGER,
minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch, kmersize, minmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sigh
Apparently dropping unique constraints is one of the things SQLite doesn't like to do, so this migration just got a lot harder.
I have also been thinking about the utility of the downgrade option, and have been led to question it. If someone has a new database that contains any fastANI
runs, removal of the kmersize
and minmatch
columns might mean some entries are non-unique, which is problematic. This could be handled by just not supporting downgrading; or we could throw a warning about this; or remove any fastANI
entries (and notify the user).
We're creating a backup, so none of these would be catastrophic, but one may be preferable.
The above issue (re: constraints) seems to be solved using the contextual version of the batch_alter_table()
code mentioned earlier in this thread (https://github.com/widdowquinn/pyani/issues/378#issuecomment-1032081759). And the issue mentioned in that comment seems to have disappeared? I still don't know what that was about.
tl;dr; I have the database successfully upgrading and downgrading, new and old databases alike.
Now I just need to decide whether warnings are needed for downgrading, et cetera, and to finish my tests.
I have also been thinking about the utility of the downgrade option, and have been led to question it. If someone has a new database that contains any
fastANI
runs, removal of thekmersize
andminmatch
columns might mean some entries are non-unique, which is problematic. This could be handled by just not supporting downgrading; or we could throw a warning about this; or remove anyfastANI
entries (and notify the user).We're creating a backup, so none of these would be catastrophic, but one may be preferable.
My first thought is that we should aim to preserve the necessary metadata for a run, which includes - for fastANI
- the kmersize
/minmatch
columns. If we cannot do so, we should loudly (with a warning) flag the problem to the user, and remove those runs from the database, with the argument that the backtransfer of fastANI
runs is not compatible with the old schema.
I think we are justified as we backup, and do not destroy, the original, late-format database.
I would see downgrading the database from fastANI
to not fastANI
support as being an unusual operation, but that we might expect more up/downgrading with fitire use.
I agree it's likely to be an unusual case, but if I'm supplying code that will perform such a downgrade (and I currently am), then I need to think about such consequences for the user.
Will write code to remove the rows, and do so loudly, with specific messages pointing to the backup.
Ran into a new issue when working on the --dry-run
tests:
It seems the modification of constraints requires reflection, which is fine when there is a connection to the database, but requires some extra steps to do offline. The error I got related to this (which is sufficiently arcane) was:
AttributeError: 'MockConnection' object has no attribute 'close'
The useful solution I have found is from the docs here (https://hellowac.github.io/alembic_doc/en/batch.html#working-in-offline-mode).
meta = MetaData()
some_table = Table(
'some_table', meta,
Column('id', Integer, primary_key=True),
Column('bar', String(50))
)
with op.batch_alter_table("some_table", copy_from=some_table) as batch_op:
batch_op.add_column(Column('foo', Integer))
batch_op.drop_column('bar')
This basically requires the table schema to be included in the migration file (or somewhere, anyway). A bit annoying, but easy enough to do, once you figure out you need to do it.
And I got a second error when dealing with the first:
sqlalchemy.exc.CircularDependencyError: Circular dependency detected. ('kmersize', 'minmatch')
The deal with this one seems to be that I hadn't yet created two dummy tables in the migration script—one that contained the columns being added/updated constraints, and one that modeled the base settings—for the different manipulations to be done on respectively.
Saw the weird error mentioned in https://github.com/widdowquinn/pyani/issues/378#issuecomment-1032081759 again. This time I seem to have managed to resolve it by swapping the order of some lines, e.g.:
batch_op.drop_constraint("base_reqs")
batch_op.add_column(sa.Column("kmersize", sa.Integer, default=None))
batch_op.add_column(sa.Column("minmatch", sa.Float, default=None))
batch_op.create_unique_constraint(
"fastani_reqs",
[
"query_id",
"subject_id",
"program",
"version",
"fragsize",
"maxmatch",
"kmersize",
"minmatch",
],
)
instead of:
batch_op.add_column(sa.Column("kmersize", sa.Integer, default=None))
batch_op.add_column(sa.Column("minmatch", sa.Float, default=None))
batch_op.drop_constraint("base_reqs")
batch_op.create_unique_constraint(
"fastani_reqs",
[
"query_id",
"subject_id",
"program",
"version",
"fragsize",
"maxmatch",
"kmersize",
"minmatch",
],
)
I still do not know why this makes sense.
Tests are passing locally for me—exclusive of dry-run
stuff, which I have decided is not an immediate priority (because I expect that will take some time yet). However, tests fail in CircleCI because I have not managed to get a working copy of sqlite3
installed there. I am rather puzzled by this, because the file is there, it seems to have all of the right permissions, yet it can't be called. I was trying to fix this in circleci
on branch circleci-editor/74/alembic_378
.
I completely scrapped the tests from before, rewriting them based on the meeting last week. They now work by first creating a dump from an input database, using that dump to initialise a new database, which is then upgraded or downgraded, and then comparing a dump of the up/downgraded database against a dump of a target. This is done with subprocess
calls because I have yet to find something that looks like the right way to do that in sqlalchemy
, and it really seems like that's not what the package is intended for. But this is why sqlite3
needs to be installed.
When I've had similar CircleCI issues in the past, I've had to pay particular attention to $PATH
on the CircleCI VM.
It can be useful for diagnosis to insert a step in the CircleCI config that checks for the presence of the executable you want in the $PATH
and writes it to an artifact file (which you can collect after the run). See e.g. https://circleci.com/docs/2.0/artifacts/ if you've not tried this before.
On the whole your new approach to the tests sounds good.
My less sophisticated approach thus far has been to put extra shell commands in the Makefile action:
wget https://www.sqlite.org/2022/sqlite-tools-linux-x86-3380100.zip
unzip sqlite-tools-linux-x86-3380100.zip
ls -l sqlite-tools-linux-x86-3380100/sqlite3
echo $PWD
echo 'export PATH=$PWD/sqlite-tools-linux-x86-3380100/sqlite3:$PATH' >> $BASH_ENV
source $BASH_ENV
echo $PATH
/home/circleci/repo/sqlite-tools-linux-x86-3380100/sqlite3 -help
sqlite3 -help
which gives:
--2022-03-14 22:15:44-- https://www.sqlite.org/2022/sqlite-tools-linux-x86-3380100.zip
Resolving www.sqlite.org (www.sqlite.org)... 45.33.6.223, 2600:3c00::f03c:91ff:fe96:b959
Connecting to www.sqlite.org (www.sqlite.org)|45.33.6.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2232504 (2.1M) [application/zip]
Saving to: ‘sqlite-tools-linux-x86-3380100.zip’
sqlite-tools-linux- 100%[===================>] 2.13M 9.51MB/s in 0.2s
2022-03-14 22:15:44 (9.51 MB/s) - ‘sqlite-tools-linux-x86-3380100.zip’ saved [2232504/2232504]
Archive: sqlite-tools-linux-x86-3380100.zip
creating: sqlite-tools-linux-x86-3380100/
inflating: sqlite-tools-linux-x86-3380100/sqlite3
inflating: sqlite-tools-linux-x86-3380100/sqlite3_analyzer
inflating: sqlite-tools-linux-x86-3380100/sqldiff
-rwxrwxr-x 1 circleci circleci 1202884 Mar 12 14:05 sqlite-tools-linux-x86-3380100/sqlite3
/home/circleci/repo
/home/circleci/repo/sqlite-tools-linux-x86-3380100/sqlite3:/home/circleci/repo:/home/circleci/repo/blast-2.2.26/bin:/home/circleci/repo:/home/circleci/repo/blast-2.2.26/bin:/home/circleci/.local/bin:/home/circleci/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/bin/bash: line 7: /home/circleci/repo/sqlite-tools-linux-x86-3380100/sqlite3: No such file or directory
Exited with code exit status 127
CircleCI received exit code 127
The failure has occurred whenever I try to invoke sqlite3
, whether using the full path, or not. I have played with what I add to $PATH
, and even downloaded and unzipped the Linux compiled version locally, so I could make sure I understood the structure. I have a feeling the issue is less existence of the file, and more to do with the permissions, except I can't see any issue with those.
I don't know if any of this obviates your suggestion with the artefacts, but I'll start looking into that, as I haven't used them before.
My less sophisticated approach thus far has been to put extra shell commands in the Makefile action:
If you're modifying the Makefile
in the repo, then I think that's not a good idea.
Changes specific to CircleCI integration should be made in the .circleci/config.yaml
file. This keeps them within CircleCI scope - especially important if the issue is CircleCI specific. It's easy to forget/fail to clean up correctly after changing more global files to fix a local problem.
Examples of how to handle a specific installation for CircleCI can be seen in that config.yaml
file, e.g.
- run:
name: install legacy BLAST
command: |
curl -o legacy_blast.tar.gz ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz
tar -zxf legacy_blast.tar.gz
echo 'export PATH=$PWD/blast-2.2.26/bin:$PATH' >> $BASH_ENV
source $BASH_ENV
Note that the name
field is descriptive, and we can install binaries prior to the pip
setup calls.
In the same section you should be able to echo
environment variables/redirect STDOUT/STDERR as appropriate to named files, which will show up in the CircleCI artifacts for the build.
This kind of workflow keeps everything involved in debugging/fixing the CircleCI runs neatly in one file, and easily removable when it is no longer required.
Since pyani
v0.3 should not pass any tests if sqlite3
is not present, a first step might be to check if the executable is in the $PATH
with which
(I assume you've done this) and, if not, try a find
to see if the executable can be located. If it can, then modifying the $PATH
might be a way forward.
If you need to install the executable, it may be simpler to use the built-in package management, which should place the executable somewhere visible, e.g.:
- run:
name: install third-party tools
command: |
sudo apt-get install csh mummer ncbi-blast+ sqlite3
My less sophisticated approach thus far has been to put extra shell commands in the Makefile action:
If you're modifying the Makefile in the repo, then I think that's not a good idea.
I described what I did poorly/inaccurately. I have been editing the .circleci/config.yml
file. What I posted is what I added there as a new workflow step.
Have you tried the apt-get
installation route? It's usually the least painful.
No, I don't tend to think about that because I've never really been able to use it. You might be right about that being the way forward, though.
I did look for a conda
option, but couldn't figure out the right way to specify the necessary channel.
Would it be apt-get
or apt install
? Or apt-get install
? I am unsure of the differences between them.
Try appending sqlite3
to the existing third party tool installation, as noted above.
Summary:
Use
alembic
or similar to manage database migration.Description:
In discussions with users, modifications to the SQLite database schema have caused issues where updating their v0.3 installation has led to incompatibility with an existing database, and resulting errors.
We could potentially avoid this by carefully versioning the SQLite/SQLAlchemy schema and using
alembic
or similar to aid migration between versions in most cases.If users maintain large analyses in their databases, we should avoid discouraging them from trusting the package by breaking backwards-compatibility.
Having used this before with Flask applications, I'm confident we could usually achieve seamless in-place upgrades of users' existing local databases.