singlestore-labs / singlestoredb-dev-image

The SingleStoreDB Dev Container is the fastest way to develop with SingleStore on your laptop or in a CI/CD environment.
Apache License 2.0
42 stars 16 forks source link

Unusable SingleStore deployment on m1 mac #72

Open caseybrown89 opened 4 months ago

caseybrown89 commented 4 months ago

Describe the bug

I recently upgraded my Mac from Monterey 12.5 to Sonoma 14.5, and Docker Desktop from 4.16.1 (engine 20.10.22, compose v2.15.1) to 4.30 (engine - 26.1.1, compose v2.27.0-desktop.2)

Prior to upgrading the OS and Docker, the SingleStore database worked as expected, though was a bit on the slow side. I was able to execute integration tests against the database which included various activities like:

  1. Multiple schema creation (targeted at a schema per tenant in a multi-tenant application)
  2. Running of migrations (creation of tables, loading via INSERT statements)
  3. Execution of read and write queries across schemas

After upgrading, the local SingleStore database on Docker is no longer tenable. The database fails in different ways across the three steps above depending on the Docker Desktop configuration:

Docker Configuration Result Error
"Use virtualization framework" - off
File sharing implementation: osxfs
Schema creation and migrations succeed (albeit very slowly), tests fail during execution of read and write queries time="2024-05-28T15:40:23Z" level=error msg="Error 1777 (HY000): Partition xxx:0 has no master instance. This is likely because the node or nodes that hold a copy of the partition are down. Check for offline leaf nodes by running SHOW LEAVES and bring them back online to restore access to the partition"
"Use virtualization framework" - off
File sharing implementation: gRPC FUSE
Schema creation and migrations succeed (albeit very slowly), tests fail during execution of read and write queries time="2024-05-28T16:02:20Z" level=error msg="Error 1777 (HY000): Partition xxx:0 has no master instance. This is likely because the node or nodes that hold a copy of the partition are down. Check for offline leaf nodes by running SHOW LEAVES and bring them back online to restore access to the partition"
"Use virtualization framework" - on
File sharing implementation: VirtioFS
Use Rosetta for x86_64/amd64 emulation on Apple Silicon: off
Migrations fail 286636318 2024-05-28 16:37:28.619 ERROR: Thread 99999 (ntid 342, conn id 29): ShardingAlterTableV6: Alter Table timed out sending PREPARE messages to all the leaves
286636432 2024-05-28 16:37:28.620 WARN: Thread 99999 (ntid 342, conn id 29): operator(): Alter table onxxx.clienthas failed, rolling back transaction. Error: 2286: Operation ALTER timed out while waiting for concurrent operation to finish. Use SHOW PROCESSLIST to investigate long running concurrent operation, or consider increasing the value of alter statement's timeout value or the default_distributed_ddl_timeout global variable
"Use virtualization framework" - on
File sharing implementation: VirtioFS
Use Rosetta for x86_64/amd64 emulation on Apple Silicon: on
Migrations fail ==> /var/lib/memsql/ce0473ab-fc9f-45ae-a5ea-0e1c6c236947/tracelogs/memsql.log <==
276996548 2024-05-28 20:31:03.363 ERROR: Thread 99999 (ntid 293, conn id 28): ShardingAlterTableV6: Alter Table timed out sending PREPARE messages to all the leaves
276997002 2024-05-28 20:31:03.364 WARN: Thread 99999 (ntid 293, conn id 28): operator(): Alter table onxxx.clienthas failed, rolling back transaction. Error: 2286: Operation ALTER timed out while waiting for concurrent operation to finish. Use SHOW PROCESSLIST to investigate long running concurrent operation, or consider increasing the value of alter statement's timeout value or the default_distributed_ddl_timeout global variable
"Use virtualization framework" - on
File sharing implementation: osxfs
Use Rosetta for x86_64/amd64 emulation on Apple Silicon: on
Migrations fail ==> /var/lib/memsql/b91e2313-f74d-4cdb-847d-1bff188babe5/tracelogs/memsql.log <==
286541859 2024-05-28 20:39:57.493 ERROR: Thread 99999 (ntid 332, conn id 28): ShardingAlterTableV6: Alter Table timed out sending PREPARE messages to all the leaves
286542303 2024-05-28 20:39:57.494 WARN: Thread 99999 (ntid 332, conn id 28): operator(): Alter table onlabsengpte.contacthas failed, rolling back transaction. Error: 2286: Operation ALTER timed out while waiting for concurrent operation to finish. Use SHOW PROCESSLIST to investigate long running concurrent operation, or consider increasing the value of alter statement's timeout value or the default_distributed_ddl_timeout global variable
"Use virtualization framework" - on
File sharing implementation: VirtioFS
Use Rosetta for x86_64/amd64 emulation on Apple Silicon: on
Remove migration file causing alter failures (deadlock)
Schema creation and migrations succeed (albeit very slowly), tests time out after 20 minutes

To Reproduce Steps to reproduce the behavior:

  1. It is hard to share the schemas and migration files causing the issues as it's for a proprietary application. I'm happy to share privately or work toward smaller, representative test case.
    1. At a high level, there are 36 migration files applied across three separate schemas (or "databases"). There is a total of 36 tables in each schema (the migration files do not correlate 1-1 with the tables)

Expected behavior

I expect the database to succeed in executing the migration files and the query performance to be reasonable. Prior to the mac OS and Docker upgrade, the test suite worked locally though its runtime was high (> 10 minutes end-to-end)

Desktop (please complete the following information):

Additional context The sql-migrate tool is being used for migration execution

caseybrown89 commented 4 months ago

The last version of Docker Desktop where I can get SingleStore to work is 4.27.2, which is purportedly Docker engine version 25.0.3 as listed in the GUI, but seems like it might actually be 25.0.2 (according to release notes). Docker Desktop 4.28.0 is bundled with engine 25.0.3 and fails to run SingleStore in our application, which also leaves me to believe Docker Desktop 4.27.2 is actually engine version 25.0.2.

According to some Docker GH issues (PHP-FPM issue in Docker Desktop 4.27.2: WARNING: [pool www] child 85 exited on signal 11 (SIGSEGV) #7182, Mac M1 - after upgrade to Docker Desktop 4.27.1 docker container with java fails with qemu: uncaught target signal 11 (Segmentation fault) - core dumped), 25.0.3 "fixes some issues with Rosetta and QEMU". I would start there looking for what may have changed.