Closed aclum closed 2 months ago
hoping Donny or Eric can look at this in @PeopleMakeCulture absence. cc @shreddd
Here's an excerpt from the logs of the GitHub Actions workflow run that @aclum linked to above (thanks!):
For reference, that PR's base branch is the berkeley
branch.
The error occurred during the "setup" phase of the following test:
test_metadata_validate_json_0
Specifically, it occurred while the following fixture was being built:
@pytest.fixture
def api_site_client():
mdb = get_mongo_db()
rs = ensure_test_resources(mdb) # <-- the error occurred here
return RuntimeApiSiteClient(base_url=os.getenv("API_HOST"), **rs["site_client"])
ensure_test_resources
is defined here. Among other things, it calls a function named ensure_schema_collections_and_alldocs
, which is defined here.
Here's an excerpt from the ensure_schema_collections_and_alldocs
function:
dump_dir = ensure_local_mongodump_exists()
mongorestore_from_dir(mdb, dump_dir, skip_collections=["functional_annotation_agg"])
It calls a function named ensure_local_mongodump_exists
(side note: I prefer the inclusion of this _exists
suffix, which says what about it someone is ensuring).
That function (and the constant it accesses) is defined here:
Looks like that function downloads a pre-made database dump from the NERSC filesystem.
The value of SCHEMA_COLLECTIONS_MONGODUMP_ARCHIVE_BASENAME
is the filename of a dump of the production database, specifically. I think this is consistent with what @aclum reported in the issue description.
In other words, I think this couples the test suite to the specific schema that happened to be in effect when that specific dump was generated.
I think this will be insufficient both for (a) the legacy-to-Berkeley roll out we are working on now; and (b) changes to certain aspects of the schema in the future.
The "coupling" (as I referred to it in the previous message) was introduced in this commit. I will assign to @dwinston to take a pass at resolving it.
During today's infrastructure meeting, we discussed the possibility of disabling this problematic code (which may involve disabling some tests) in order to unblock some current work; and developers can work on implementing a long term fix in parallel. I'll look into this with that path forward in mind today.
I created a PR in which I disabled the problematic code. The PR is here: https://github.com/microbiomedata/nmdc-runtime/pull/664
This came up in a PR where Patrick was working on updating ETL code. there are errors about associated_studies being missing from biosample_set, upon further inspection it is trying to compare prod mongo, I inferred this based on it pulling in collection names which don't exist in berkeley like
omics_processing_set
, with the berkeley nmdc-schema release candidate releaseWe need an alldocs in mongo berkeley and for this test to point to that.
cc @dwinston @eecavanna @pkalita-lbl