ml4ai / automates

AutoMATES: Automated Model Assembly from Text, Equations, and Software
https://ml4ai.github.io/automates
Other
25 stars 9 forks source link

Gromet FN/Metadata v0.1.5 support for Gromet Importer #333

Closed vincentraymond-ua closed 1 year ago

vincentraymond-ua commented 1 year ago

Updates the json_to_gromet importer to support the v0.1.5 format for both Gromet FN and Gromet Metadata.

Additionally, updates the process_file_system function to be more functional by returning the Gromet module collection and disabling writing to file unless write_to_file argument is True.

codecov[bot] commented 1 year ago

Codecov Report

Merging #333 (ed71083) into master (f1f3c15) will increase coverage by 0.52%. The diff coverage is 0.00%.

:exclamation: Current head ed71083 differs from pull request most recent head f70d59c. Consider uploading reports for the commit f70d59c to get more accurate results

@@            Coverage Diff             @@
##           master     #333      +/-   ##
==========================================
+ Coverage   33.94%   34.47%   +0.52%     
==========================================
  Files         114      114              
  Lines       22752    22661      -91     
==========================================
+ Hits         7724     7813      +89     
+ Misses      15028    14848     -180     
Impacted Files Coverage Δ
...omates/program_analysis/JSON2GroMEt/json2gromet.py 0.00% <0.00%> (ø)
...mates/program_analysis/CAST2GrFN/model/cast/var.py 68.65% <0.00%> (-1.50%) :arrow_down:
...ogram_analysis/CAST2GrFN/model/cast/scalar_type.py 44.73% <0.00%> (-1.42%) :arrow_down:
...alysis/CAST2GrFN/ann_cast/variable_version_pass.py 80.39% <0.00%> (-0.14%) :arrow_down:
...lysis/CAST2GrFN/visitors/cast_to_agraph_visitor.py 16.81% <0.00%> (+0.39%) :arrow_up:
...am_analysis/CAST2GrFN/ann_cast/id_collapse_pass.py 76.47% <0.00%> (+0.74%) :arrow_up:
...lysis/CAST2GrFN/ann_cast/cast_to_annotated_cast.py 77.14% <0.00%> (+0.75%) :arrow_up:
...ates/program_analysis/PyAST2CAST/py_ast_to_cast.py 13.81% <0.00%> (+1.37%) :arrow_up:
...gram_analysis/CAST2GrFN/ann_cast/annotated_cast.py 69.19% <0.00%> (+10.99%) :arrow_up:
...rogram_analysis/CAST2GrFN/model/cast/source_ref.py 86.07% <0.00%> (+11.39%) :arrow_up:
... and 2 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

kwalcock commented 1 year ago

I'm concluding that the specs from files like https://github.com/ml4ai/automates-v2/blob/master/docs/source/gromet_FN_v0.1.5.yaml are really only for human consumption. They are there so that others can manually write code to process the format. That seems unfortunate. If correct, I would remove the discussion of code generation.

cl4yton commented 1 year ago

Hi @kwalcock : Thanks for your comment -- I want to make sure I understand.

Some background (which you may already know, but in case I didn't review with you...): It is the case that we currently use the swagger-codegen generated data model (actually three: CAST, GrometFN, and Metadata) in the Program Analysis (PA) pipeline. The auto-generated code ends up being directly versioned within the automates repo in these locations (all under <automates_root>/automates/):

Inserting the swagger-codegen generated code into these respective locations does require some post-processing to fix paths. I created a script that manages this: /scripts/swagger/codegen_swagger_models.py . For each of the three models (CAST, GrometFN, Metadata), this script

  1. Copies the existing corresponding data model directory in automates into a temp directory; this makes it easy to "roll back" if something goes wrong with the running of the script; if all goes well, this temp directory can be manually removed.
  2. Runs swagger-codegen for the model (this uses the swagger-codegen command in step 2 of the instructions in the preamble of the yaml model files).
  3. Copies the generated model files into the corresponding data model directory, adjusting paths as needed in each file to match the directory context.

Now, I believe you are referring to the "preamble" text that is at the start of each of the swagger model yaml files that describes how to use swagger-codegen to generate the source code class structure. These instructions were originally created as a reminder to ourselves for how these are generated (the original version of these instructions were created by Daniel Dicken when he first created the swagger spec). Until I had created the above script, I would perform the generation and insertion manually. Since creating the script, I have maintained the instructions at the top for each model spec, in case others who might wish to programmatically interact with the data models but not using automates want to do so. But within automates, we use the script described above to maintain the data model.

@vincentraymond-ua maintains a serializer/deserializer for the internal automates data structures based on the GrometFN (in particular, the GrometFNModel and GrometFNModuleCollection) and Metadata, where the export target is a particular JSON format that we call the "Gromet FN relational database schema". This JSON format is governed by ASKEM "gromet schema" conventions that don't exactly match the swagger model, which is why we maintain the serializer/deserializer and don't just directly use the build-in .to_json and load. It is a bummer that we can't just directly do things that way.

So, this PR is about Vincent updating the serializer/deserializer to support the latest changes to the Metadata data model (v0.1.5).

Now that we have the explicit internal data structure (using the GrometFN and Metadata data structures derived from the swagger models), along with our support for the serializer/deserializer, I would recommend using those to programmatically interact with the Gromet FN model, as it is generated by the Program Analysis function network extraction pipeline. Justin Lieffers is also currently developing a graph database representation which is suited for querying and manipulating function networks, and this is created directly from the Gromet FN model in the automates PA pipeline. But I think the instructions for how the data models are generated directly using swagger-codegen might also still be of use to some.

Does this help clarify what you are responding to in your comment? I do think that the way we have things documented is leading to confusion. Given the above context, what would your recommend for making this clear? (... And are there other issues we should consider?)

Thanks!