Open Shalsh23 opened 10 months ago
let's work though this in a pair programming session
after make squeaky-clean
and before make make-rdf
, please try make all test
I think these are the critical line sin the logging:
warning: 524 Server Error: for url: https://api.microbiomedata.org/nmdcschema/data_object_set?max_page_size=10000 warning: FastAPI request to nmdcschema/data_object_set appears to have failed. Trying as a PyMongo query.
In --skip-collection-check
mode, pure-export
tries to get MongoDB contents through the runtime API, but it can fall back to PyMongo.
data_object_set
URL into you web browser, just with a smaller page size.local/.env
. We had just created that as an empty file before you started make make-rdf
, because we assumed you wouldn't need the PyMongo connection. So if the server errors are persistent for you, we add the following to your local/.env
, including the appropriate valuesI just tried https://api.microbiomedata.org/nmdcschema/data_object_set?max_page_size=10 and got the following:
{
"detail": [
{
"type": "missing",
"loc": [
"query",
"filter"
],
"msg": "Field required",
"input": null,
"url": "https://errors.pydantic.dev/2.4/v/missing"
},
{
"type": "missing",
"loc": [
"query",
"page_token"
],
"msg": "Field required",
"input": null,
"url": "https://errors.pydantic.dev/2.4/v/missing"
},
{
"type": "missing",
"loc": [
"query",
"projection"
],
"msg": "Field required",
"input": null,
"url": "https://errors.pydantic.dev/2.4/v/missing"
}
]
}
Hi @Shalsh23 . I see you assigned this issue to me. What actions would you like me to take?
after
make squeaky-clean
and beforemake make-rdf
, please trymake all test
This did help. I was able to make progress but stumbled on another error. This error is due to perhaps the expectation of riot
being installed at the specified path. Are there other tools that are expected to be installed at a particular path for this makefile to work?
INFO:root:TRUE: OCCURS SAME: Biosample == TextValue owning: Biosample
INFO:root:TRUE: OCCURS SAME: SubSamplingProcess == QuantityValue owning: SubSamplingProcess
INFO:root:FALSE: OCCURS BEFORE: OntologyClass == OntologyClass owning: ControlledIdentifiedTermValue
INFO:root:FALSE: OCCURS BEFORE: QuantityValue == QuantityValue owning: MaterialSamplingActivity
INFO:root:FALSE: OCCURS BEFORE: MaterialContainer == MaterialContainer owning: MaterialSamplingActivity
INFO:root:FALSE: OCCURS BEFORE: QuantityValue == QuantityValue owning: ReactionActivity
INFO:root:Using SchemaView with im=None
real 3m0.169s
user 2m57.688s
sys 0m1.719s
export _JAVA_OPTIONS=-Djava.io.tmpdir=local
~/apache-jena/bin//riot --validate local/mongo_as_nmdc_database.ttl # < 1 minute
bash: /Users/shalkishrivastava/apache-jena/bin//riot: No such file or directory
make: [local/mongo_as_nmdc_database.ttl] Error 127 (ignored)
date
Fri Nov 3 14:19:47 CDT 2023
time poetry run anyuri-strings-to-iris \
--input-ttl local/mongo_as_nmdc_database.ttl \
--jsonld-context-jsons project/jsonld/nmdc.context.jsonld \
--emsl-biosample-uuid-replacement emsl_biosample_uuid_like \
--output-ttl local/mongo_as_nmdc_database_cuire_repaired.ttl
Loading prefixes from project/jsonld/nmdc.context.jsonld
Loading local/mongo_as_nmdc_database.ttl
Loaded local/mongo_as_nmdc_database.ttl
Iterating over triples
Serializing to local/mongo_as_nmdc_database_cuire_repaired.ttl
Expanded CURIE literals in RDF graph.
real 1m2.875s
user 1m2.062s
sys 0m0.592s
export _JAVA_OPTIONS=-Djava.io.tmpdir=local
~/apache-jena/bin//riot --validate local/mongo_as_nmdc_database_cuire_repaired.ttl # < 1 minute
bash: /Users/shalkishrivastava/apache-jena/bin//riot: No such file or directory
make: [local/mongo_as_nmdc_database_cuire_repaired.ttl] Error 127 (ignored)
date
Fri Nov 3 14:20:50 CDT 2023
Hi @Shalsh23 . I see you assigned this issue to me. What actions would you like me to take?
I assigned it to you to formally note that you are already helping me with this issue.
Apache jena, which includes the riot
CLI can be downloaded from here: https://jena.apache.org/download/index.cgi
The project.Makefile
has a JENA_PATH
environment variable for the directory that contains all Jena tools.
I have opinionatedly set that to ~/apache-jena/bin/
but you can change it as long as you don't commit your change. I guess we could also put that variable assignment in the local/.env
.
It may be possible to install the Jena tools system-wide with homebrew. In that case, JENA_PATH
should be set to an empty string
@Shalsh23 I really appreciate that you have stuck with this and have documented your experience. If you have lost your passion for running make make-rdf
locally, it can be run with a manually-triggered GH action in any branch now
Context: I am trying to run
project.makefile
to test the new migration script by validating a datafile after running schema migration.Environment: Python version
3.11.4
Steps followed:
cd
to root dir ofnmdc-schema
repopoetry update
poetry install
make squeaky-clean
make make-rdf
The output of this command throws an error as follows:selected_collections = ('biosample_set', 'data_object_set', 'extraction_set', 'field_research_site_set', 'library_preparation_set', 'mags_activity_set', 'metabolomics_analysis_activity_set', 'metagenome_annotation_activity_set', 'metagenome_assembly_set', 'metagenome_sequencing_activity_set', 'metatranscriptome_activity_set', 'nom_analysis_activity_set', 'omics_processing_set', 'pooling_set', 'processed_sample_set', 'read_based_taxonomy_analysis_activity_set', 'read_qc_analysis_activity_set', 'study_set') Attempting to get 0 documents from nmdcschema/biosample_set in pages of 10000. Retrieved 7594 entries out of 0 from nmdcschema/biosample_set Attempting to get 0 documents from nmdcschema/data_object_set in pages of 10000. warning: 524 Server Error: for url: https://api.microbiomedata.org/nmdcschema/data_object_set?max_page_size=10000 warning: FastAPI request to nmdcschema/data_object_set appears to have failed. Trying as a PyMongo query. Traceback (most recent call last): File "", line 1, in
File "/Users/shalkishrivastava/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MtyHo_Ar-py3.11/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shalkishrivastava/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MtyHo_Ar-py3.11/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/shalkishrivastava/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MtyHo_Ar-py3.11/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shalkishrivastava/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MtyHo_Ar-py3.11/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shalkishrivastava/shalkishrivastava_data/LBL/nmdc/nmdc-schema/nmdc_schema/mongo_dump_api_emph.py", line 354, in cli
direct_data_all = nmdc_pymongo_client.get_docs_from_pymongo(current_collection, max_docs)
^^^^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'nmdc_pymongo_client' where it is not associated with a value
real 1m46.110s user 0m0.619s sys 0m0.128s make: *** [local/mongo_as_unvalidated_nmdc_database.yaml] Error 1