Open dan-homebrew opened 2 months ago
Some thoughts on data migration:
CREATE TABLE IF NOT EXISTS schema_version (
version INTEGER NOT NULL
);
sql
scripts for migration by version. These scripts can be executed after run cortex update
. Not sure if we can run sql
scripts by postscript cc: @hiento09 Question: should we use migration scripts or implement a C++ database migration component?
yml
file path in database, so if model structure changes, we also need to update the models' databaseWhat do you think? @dan-homebrew @louis-jan @janhq/cortex
@vansangpfiev Overall, I think we should do what allows us to ship something quickly this Friday.
/migrations
1.0.2.cpp
1.0.3.cpp
# 1.0.2.cpp
void up() {
// Run the SQL script (assuming it's a shell script that runs SQL commands)
std::string runSqlCmd = "psql -c \"" + sqlScript + "\"";
int runSqlResult = system(runSqlCmd.c_str());
if (runSqlResult == 0) {
std::cout << "SQL script executed successfully." << std::endl;
} else {
std::cerr << "Error: Failed to execute the SQL script." << std::endl;
}
// Define the directories
std::string rootDir = "/";
std::string targetDir = "/path/to/target/directory";
// Define the file to be moved
std::string fileToMove = rootDir + ".cortexrc";
// Check if the file exists
std::ifstream file(fileToMove);
if (!file) {
std::cerr << "Error: The file .cortexrc does not exist in the root directory." << std::endl;
return;
}
// Move the file to the target directory
std::string moveCmd = "mv " + fileToMove + " " + targetDir;
int moveResult = system(moveCmd.c_str());
if (moveResult == 0) {
std::cout << "The .cortexrc file has been moved to the target directory." << std::endl;
} else {
std::cerr << "Error: Failed to move the .cortexrc file." << std::endl;
}
void down() {
...
}
Please push back on this @janhq/cortex - as you can tell from the RoR reference, my approach may be very outdated.
We likely need to design for both local running and cloud deployment. Adding a code constraint would block our migration from the deployment layer, which likely waits for an instance to spin up and perform a health check, which is inefficient. There could also be a scenario where we want to share the database instance across services like cortex.py and cortex.cpp.
However, one advantage is that they can embed cortex.cpp as a standalone binary without having to worry about other services, which is particularly useful when embedding into Jan
.
I’d prefer built-in migrations as suggested above for cortex.cpp to ensure embedding in Jan won’t cause issues.
We will use SQL scripts for up/down migrations, try to minimize changes between versions to ensure smooth transitions. For complex down migrations, we will create a PR to address any issues.
Structure
cortexcpp
|__ models/
|__ engines/
|__ migrations/
|__ v1/
| |__ v1.sql
| |__ data_structure_v1.sh
|__ v2/
|__ v2.sql
|__ data_structure_v2.sh
Question: How do we ship the migration scripts?
cortex-server
binary, they have an option to fall back to option 1cc: @dan-homebrew @hiento09 @janhq/cortex
@vansangpfiev Our discussion:
.sql
and .sh
scripts may represent a security/backdoor risk in futurecortexcpp
|__ models/
|__ engines/
|__ migrations/
|__ v1.0.2.cpp
|__ v1.0.3.cpp
Oops. This wouldn’t work for nightly migration support or switching between commits.
Hmm, can we solve this via running migrations manually for Nightly?
Nightly
nightly
nightly
, we trigger migrations manuallycortex migrations up 1.0.3
, cortex migrations down 1.0.3
down
and up
in migrator between nightly versions
down
and then up
Switching between commits
@dan-homebrew IMO, It is necessary to distinguish between the data version and the software version. Since all migration actions occur within the software, the application may not know its current version during modifications (can be stable, beta, nightly, etc).
To ensure clarity and consistency, we can increment the data version each time a change is made. In the event of a conflict, we will revert to the previous version of the data. This separation allows for more effective management of both software and data changes, enhancing stability and reliability throughout the migration process.
@dan-homebrew IMO, It is necessary to distinguish between the data version and the software version. Since all migration actions occur within the software, the application may not know its current version during modifications (can be stable, beta, nightly, etc).
To ensure clarity and consistency, we can increment the data version each time a change is made. In the event of a conflict, we will revert to the previous version of the data. This separation allows for more effective management of both software and data changes, enhancing stability and reliability throughout the migration process.
TL, DR: If we roll back to a specific commit abc123
, we must know exactly how the data schema looks in a deterministic way.
Goal
1154
1116
Scope
.cortexrc
)Discussion
Success Criteria:
Tasklist