Closed bcdurak closed 32 minutes ago
[!IMPORTANT]
Review skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yaml
file in this repository. To trigger a single review, invoke the@coderabbitai review
command.You can disable this status message by setting the
reviews.review_status
tofalse
in the CodeRabbit configuration file.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
@htahir1 It would be great if you could also take a look at the docs changes.
LLM Finetuning template updates in examples/llm_finetuning
have been pushed.
Classification template updates in examples/mlops_starter
have been pushed.
E2E template updates in examples/e2e
have been pushed.
NLP template updates in examples/e2e_nlp
have been pushed.
@bcdurak was nice to talk to you today, here is a brief summary of potential improvements with the current metadata:
value
to a separate table. Next, you would need to think about garbage collection to remove orphaned entries periodically. Also, this makes values immutable, obviously, since they can be used somewhere else.LMK, if you want to talk more.
P.S. @htahir1 FYI, this might be relevant for product road mapping and prioritization of tasks, so your input is warmly welcomed.
@avishniakov Just had a discussion with @htahir1, here is what we decided to do:
metadata values duplication
with the proposed structure is out of the scope of this ticket for now. We might solve this in a separate PR in the future.@avishniakov I made some significant changes after our last discussion:
log_metadata
, only one entity will be linked to the newly generated metadata entry.cached
flag. This helps us to create a new link if another step is using the cached execution of the original step and we link the original metadata to the new step. RunMetadataResource
) that are used in the process of generating run metadata.fetch_metadata
returns the value of the latest entry for each key (we use it while building the response models) and fetch_metadata_collection
returns a list for each key containing entries with a value and a timestamp (we can potentially use this in the future to create histograms, this is just the initial version).I also updated all the docs pages accordingly.
@schustmi Would be nice if you could read the changes I described above before diving into the PR.
E2E template updates in examples/e2e
have been pushed.
@avishniakov I also implemented some new changes, I thought you might like. You can check them in this commit.
Simply put, the log_metadata
function now features two new parameters called infer_artifact
and infer_model
. So you can call log_metadata(metadata={}, infer_model=True)
which will use the step_context
and attach the metadata to the model. Thanks to this, I was also able to adjust the old functions like log_artifact_metadata
to use the new log_metadata
.
LLM Finetuning template updates in examples/llm_finetuning
have been pushed.
Classification template updates in examples/mlops_starter
have been pushed.
E2E template updates in examples/e2e
have been pushed.
NLP template updates in examples/e2e_nlp
have been pushed.
Describe changes
This PR optimizes the way that we store run metadata related to different entities. (addresses the review comment here)
In our old implementation, when someone calls
log_metadata
, it is possible that they attach the same metadata to different entities such as pipeline runs, step runs, and model versions. This process used to create X different entries with the same key-value pair in the metadata table.In order to optimize this process, this PR separates the previous run metadata table into two tables:
For this to work, I also implemented a new
RunMetadataRequest
model, that can hold more than one entity per key-value pair. In all the previous calls that used this request model, all affected models and schemas were adjusted accordingly.Other related changes
Remaining TODOs
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes