cargo sqlx prepare expecting DATABASE_URL in workspace scenario

lweberk commented 3 years ago

When generating sqlx-data.json offline information for CI/CD within a workspace with subcrates for multiple databases containing their respective and valid .env information, it expects DATABASE_URL to be set.

cargo check works as expected. Macros fetching their respective DATABASE_URL from workspace subcrate .env.

Expected behavior would be for cargo sqlx prepare to do the same at workspace root. Alternatively to not confuse the different DATABASE_URL when generating them independently in their respective subcrates for separation.

cargo sqlx prepare --merged, if supposed to enable workspace sqlx-data.json information merging, does not change this behaviour.

Spun out of #121

lweberk commented 3 years ago

I have been looking into fixing this and found the following mechanics;

1) cargo sqlx prepare in essence wipes target/sqlx/query-*.json, then runs cargo check so it generates those files again. 2) Takes the freshly generated files and unifies their contained json objects into one with their inner hash as key in the new unifying super-object.

The caveat is that, in all that process it also acceses DATABASE_URL to determine the type of database to then add that at the top level object as "db": "{Postgres, MySql, ...}"

Question: What is that being used for? Can I safely bypass it in the case of --merged?

I have manually merged those files with a small python script, omitting the db field altogether. So far running SQLX_OFFLINE=true cargo check has resulted positively as if everything resulted in a OK. Will attempt to reproduce in CI/CD clean room.

Insights to the implications of the things I'm touching upon by a developer of this project would be greatly appreciated. Fumbling along the surface is very different from having a deeper understanding of the underpinning motivations and reasons of them.

lweberk commented 3 years ago

Ok so the db type information is expected else, it will not run. It also guides the set of types to be used at mapping the wired types to the local equivalents. There is no way around it. Messing around with it would probably be well out of scope of a work around until #121 gets properly solved.

The only workaround I can think of is to have individual sqlx-data.json's in each workspace subcrate and to have checks resolve to those first. It would also mean fixing the conflagration of URL_DATABASE environment vars between them.

Pointers to where to start to save time?

lweberk commented 3 years ago

For the sake of completeness.

A current workaround:

1) Remove target/sqlx/* files 2) Run cargo check so that each subcrate workspace generates the files for each query 3) Run this python snippet to merge all queries in one sqlx-data.json at root of the project directory 4) Copy it to each of the subcrate workspaces that have queries in them to check 5) Remove all the queries that are not specific to that subcrate (optional) 6) At the top of these files add a "db" : "PostgreSQL | MySQL | ...", to the checker can properly map the type conversions

import json
import glob

glob.glob("target/sqlx/*")

data   = [json.load(open(path, "r")) for path in glob.glob("target/sqlx/query-*.json")]
merged = {v["hash"]: v for v in data}

json.dump(merged, open("sqlx-data.json", "w"), indent=4)

The checking against offline for individual subcrates just works. Its the sqlx prepare that does not. Until there is some clarity provided by the devs of this project on what the intentions to the whole multi-database question is, I'll defer to working around it.

jplatte commented 3 years ago

I just ran into this same issue, and adopted your script. Here's what I now committed as sqlx-prepare.py:

#!/usr/bin/env python3

import json
import glob

data   = [json.load(open(path, "r")) for path in glob.glob("target/sqlx/query-*.json")]
merged = { "db": "PostgreSQL", **{ v["hash"]: v for v in data } }

json.dump(merged, open("sqlx-data.json", "w"), indent=4)

There isn't actually a need to copy sqlx-data.json to any subdirectories, the macros will find it in the workspace root. Just have to run SQLX_OFFLINE=false cargo check --workspace and then ./sqlx-prepare.py whenever some query code changes. Maybe I should even make the script run that command automatically (and also remove target/sqlx before doing so) 🤔

jplatte commented 3 years ago

Updated script that fully replaces cargo sqlx prepare:

#!/usr/bin/env python3

import json
import glob
import os
import shutil
import subprocess

shutil.rmtree("target/sqlx", ignore_errors=True)

os.environ["SQLX_OFFLINE"] = "false"
subprocess.run(["cargo", "check", "--workspace"])

data   = [json.load(open(path, "r")) for path in glob.glob("target/sqlx/query-*.json")]
merged = { "db": "PostgreSQL", **{ v["hash"]: v for v in data } }

json.dump(merged, open("sqlx-data.json", "w"), indent=4)

arlyon commented 3 years ago

Here is a shell script using jq. I've found that if you delete target/sqlx every time, it is only generated if there have been changes to those specific crates, so I touch the files for good measure as well. It also fails if you happen to have DATABASE_URL set, as sqlx will use that instead of the .env specific ones, so unset that as well.

#!/usr/bin/env bash
rm -rf target/sqlx
touch crates/*/src/*.rs
env -u DATABASE_URL SQLX_OFFLINE=false cargo check --workspace
jq -s '{"db": "MySQL"} + INDEX(.hash)' target/sqlx/query-*.json > sqlx-data.json

zopieux commented 3 years ago

Thanks @arlyon, for reproducibility I think it's a good idea to make sure the hashes are sorted. It also reduces the probability of useless VCS diffs.

jq -s '{"db": "MySQL"} + (INDEX(.hash)|to_entries|sort_by(.key)|from_entries)' target/sqlx/query-*.json > sqlx-data.json

I can live with that workaround, but that's still a pretty big hack. :disappointed:

LLBlumire commented 1 year ago

This doesn't seem to be working for me? cargo check --workspace doesn't appear to generate a /target/sqlx ?

cleverjam commented 1 year ago

This doesn't seem to be working for me? cargo check --workspace doesn't appear to generate a /target/sqlx ?

Same. I can't manually locate the target/sqlx directory either.

Did you find a solution ?

exellentcoin26 commented 8 months ago

I did not test this further, however, I found the following. The sqlx prepare command internally runs cargo check (--workspace) with the following env variables: SQLX_TMP=<some-dir>, SQLX_OFFLINE="false", SQLX_OFFLINE_DIR=<some-dir>. If these directories exist before running cargo check with the environment variables set, the query files will be in the last directory.

kontsaki commented 7 months ago

Hello, I wanted to make sure I understand correctly, doesn't sqlx support a workspace with crates that use different databases?

cleverjam commented 7 months ago

Hello, I wanted to make sure I understand correctly, doesn't sqlx support a workspace with crates that use different databases?

The issue as I understand is when running sqlx prepare from workspace root and different .env files in each sub crate pointing to different DATABASE_URL the prepare command does not work as expected.

I get around this issue by running sqlx prepare from the subcrate's directory for now.... this is more of an inconvenience than a lack of support imho

exellentcoin26 commented 6 months ago

You can also solve this by loading in the .env file during compilation from each crate separately. For example, using dotenv-build in the build script of each crate. You do have to make sure that the crate is recompiled when the file changes. For example, using cargo flags printed in the build script. This makes it possible to run all commands from the root of the workspace as during compilation, cargo will set the correct env variables for each crate.

launchbadge / sqlx

cargo sqlx prepare expecting DATABASE_URL in workspace scenario #1223