materialsproject / jobflow

jobflow is a library for writing computational workflows.
https://materialsproject.github.io/jobflow
Other
90 stars 25 forks source link

CounterStore #528

Closed jmmshn closed 6 months ago

jmmshn commented 6 months ago

Generical global counters might be useful for some special workflows

Many of the document models in emmet are designed to deal with either int or "mp-"+int type of ids for the different documents. As such, they sometimes require unique integer counters for specific downstream documents to be built properly. This feature should allow atomate2 to automatically assign unique ids to the tasks which (I believe) is currently unset. If the task id is set I think more of the aggregated document model in emmet can be directly used in the middle of future more complex workflows.

An example of this problem is here: https://github.com/materialsproject/atomate2/pull/655

Where the InsertionElectrode document expects task_id to be set.

codecov[bot] commented 6 months ago

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (8ae6ec9) 99.86% compared to head (3cab74b) 99.48%. Report is 3 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #528 +/- ## ========================================== - Coverage 99.86% 99.48% -0.39% ========================================== Files 20 20 Lines 1512 1554 +42 Branches 416 427 +11 ========================================== + Hits 1510 1546 +36 - Misses 2 5 +3 - Partials 0 3 +3 ``` | [Files](https://app.codecov.io/gh/materialsproject/jobflow/pull/528?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject) | Coverage Δ | | |---|---|---| | [src/jobflow/core/store.py](https://app.codecov.io/gh/materialsproject/jobflow/pull/528?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject#diff-c3JjL2pvYmZsb3cvY29yZS9zdG9yZS5weQ==) | `97.97% <86.04%> (-2.03%)` | :arrow_down: |
jmmshn commented 6 months ago

This feature should allow atomate2 to automatically assign unique ids to the tasks which (I believe) is currently unset. If the task id is set I think more of the aggregated document model in emmet can be directly used in the middle of future more complex workflows.

@utf, I think a counter for the TaskDocs in atomate2 can be incorporated pretty easily by monkey patching the constructor method to call the counter incrementation. I tried working around this but there is a consistent convention in emmet of sorting and using the lowest numerical id of the aggregated documents to assign the id of the grouped document so I don't see a way around the issue of assigning a numerical counter. (There is an implicit assumption of that the lower ID is older so you can prevent MPID drift this way.)

utf commented 6 months ago

Thanks @jmmshn, this is a nice implementation. As I mentioned on the atomate2 PR, the goal with jobflow is to move away from counters since they cause an issue when merging two databases. If MP need mp_id's then I suggest they be added in their ingestion pipeline. They shouldn't be needed for atomate2/jobflow.

jmmshn commented 6 months ago

Closing this since we are resolving it in the other PR #529