materialsproject / emmet

Be a master builder of databases of material properties. Avoid the Kragle.
https://materialsproject.github.io/emmet/
Other
49 stars 63 forks source link

Update validation doc to use TaskDoc + add nelements and symmetry_number as fields in document model #1026

Closed tsmathis closed 2 weeks ago

tsmathis commented 4 weeks ago

updates ValidationDoc to use the new atomate 2 TaskDoc

Also added nelements and symmetry_number as optional fields for use when grouping documents together for object storage, i.e., s3

tsmathis commented 4 weeks ago

@esoteric-ephemera , would you do anything differently here?

codecov-commenter commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.15%. Comparing base (b68abd6) to head (78695fd). Report is 1 commits behind head on main.

Files Patch % Lines
emmet-core/emmet/core/tasks.py 88.88% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1026 +/- ## ========================================== + Coverage 90.02% 90.15% +0.12% ========================================== Files 142 142 Lines 13390 13394 +4 ========================================== + Hits 12055 12075 +20 + Misses 1335 1319 -16 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

esoteric-ephemera commented 4 weeks ago

This looks good! I would only change task_doc: TaskDocument | TaskDoc so that it's backwards compatible for now. The line about model dumping in ValidationDoc:

calcs_reversed = [
            calc if not hasattr(calc, "model_dump") else calc.model_dump()
            for calc in task_doc.calcs_reversed
]

is there to support TaskDocument (calcs_reversed is a dict) and TaskDoc (calcs_reversed is a pydantic model object).

Obvs when we're ready to fully deprecate TaskDocument, we can remove that glue code and type suggestion.

tsmathis commented 4 weeks ago

Correct me if I'm wrong, but isn't TaskDocument not compatible with the dot notation access that is used for TaskDoc's props now?

See the changes I had to make here: f5f52fb

esoteric-ephemera commented 4 weeks ago

You're right about the dot notation, but we can support both if you change _magmom_check to depend only on calcs_reversed, since those lines I pointed out ensure that's always deserialized. The full task doc isn't needed for the magmom check, just calcs reversed

Also TaskDoc.output.structure should be decorated with magnetic moments, so returning something like any(site.properties.get("magmom",0.) > eles_max_vals.get(site.label, 0.) for site in structure) should work in place of the current very long check. (The structure needs to be serialized for that to work.)

Happy to add these changes if that's easier

tsmathis commented 4 weeks ago

Yeah feel free to add that in if you know what to change off the top of your head. I'd have to trial and error my way through it.

I'm running the validation builder over the full set of task docs to see if there are other gotchas as well

tsmathis commented 3 weeks ago

@munrojm, any comment/issue here? I've run the validation builder with the legacy validation settings on the 300k+ task docs under the mp_production_old tag and everything ran smoothly there.

Working through the remainder of the newer production tags now, but I don't think there should be issues with the newer calcs.

esoteric-ephemera commented 3 weeks ago

@munrojm also want to note that this PR closes issues #960 and #1021

munrojm commented 2 weeks ago

No further comments from me, this looks good.