Open Linchin opened 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I read that this issue was not included in the previous release as mentioned in this comment , just wanted to ask whether we have an update on this or not.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen
Is readding this feature to v2 planned for future releases? It would be very helpful.
@AnnKatrinBecker: You can't reopen an issue/PR unless you authored it or you are a collaborator.
I'm a bit unfamiliar with the v1 version of this, did this come from argo's archive logging feature?
In v2 kfp aims to be agnostic to the backing engine, so if this is an argo specific feature, we may need to consider a separate feature/proposal on how to handle persistent logging in general for kfp.
I'm a bit unfamiliar with the v1 version of this, did this come from argo's archive logging feature?
In v2 kfp aims to be agnostic to the backing engine, so if this is an argo specific feature, we may need to consider a separate feature/proposal on how to handle persistent logging in general for kfp.
Yes, Argo Workflows stores the logs separately and given only kfp metadata, there currently seems to be no way to obtain the logs' locations. You'd need to access the Argo Workflow object directly to obtain the info.
Including this feature right into kfp would make sense and imo is much needed, as otherwise, users aren't able to view logs after the step's pod has been deleted.
Hi @HumairAK What do you think about this ? Points:
Thoughts:
/reopen
@HumairAK: Reopened this issue.
I agree persisted logging is very important, but we should clarify if we need them as artifacts, or do users just want to see logs after pods have been removed?
For example, if we have persisted logs, the log viewer could instead fetch the logs from the persisted location instead of pods (I believe this is the logic currently present in ui, though it may not be utilized in v2). Could it be confusing to see 2 different locations where logs are being shown?
DS engineer would be interesting to see logs as artifact, as part of models producer investigation
Not disagreeing, but could you elaborate on specifics?
I like the idea of decoupling implementation to existing solutions (or at least giving the user the option to, if we opt to natively implement),
in sdk we should be able disable logs
I think we should also allow users to disable/enable not only in the SDK, but via api & ui as well.
Why engineers may interesting in this:
v1 version was implemented correctly from concept view, because it combine logs interaction during run (life time stream from pod) and logs persistent for long term.
Saving to object storage can be default, but it is also possible to use other solutions for logs streaming like, for example ELK, kafaka, or Google Logging.... so this can be optional
Current workaround with argoworkflow backend is using keyfomat like /artifacts/{workflow_id}/{pod_id}/main.log via patching artifact_ui split function and use v1 handler.
Current workaround with argoworkflow backend is using keyfomat like /artifacts/{workflow_id}/{pod_id}/main.log via patching artifact_ui split function and use v1 handler.
@sanchesoon Can you give more details on this workaround? I am currently using KFP 2.1.0
with the normal argoworkflow backend. I really want a way to provide users the ability to view their pipeline tasks' logs simply by clicking through the KFP UI.
@kubeflow/pipelines @HumairAK @pdettori I would like to ask that we prioritize this issue, it is a significant regression for V2 pipelines.
The gist of the issue is that when Argo Workflows has the archiveLogs
config set (which we do at the cluster level in our manifests), then Argo Workflows automatically creates a "output artifact" called main-logs
which full Pod logs of each step as an S3/MinIO artifact.
In Kubeflow Pipelines V1, we used to capture this output artifact, which allowed users to see the old logs when the Pod had disappeared, but in V2 this was strangely omitted.
Fixing it should be very straightforward, we just need to capture the artifact location of the main-logs
output of the executor
step, and store it as an "Output Artifact" in the ML Metadata of the step.
For reference, we can retrieve this artifact by reading the status field of the Workflow
resource:
status:
nodes:
pipeline-1-8g5bs-3160443754:
displayName: executor
outputs:
artifacts:
- name: main-logs
s3:
key: artifacts/team-1/pipeline-1-8g5bs/2024/06/13/pipeline-1-8g5bs-3160443754/main.log
Completely agree with @thesuperzapper. Having a run fail and not being able to determine the reason why because the logs are not available is a show stopper.
yes, This is show stopper and required to retrieve logs of completed DAGs
Current workaround with argoworkflow backend is using keyfomat like /artifacts/{workflow_id}/{pod_id}/main.log via patching artifact_ui split function and use v1 handler.
@sanchesoon Can you give more details on this workaround? I am currently using KFP
2.1.0
with the normal argoworkflow backend. I really want a way to provide users the ability to view their pipeline tasks' logs simply by clicking through the KFP UI.
Hi Yes I can. In argo workflow you should use something like this:
archiveLogs: true
s3:
keyFormat: {{ `"pod/log/{{workflow.name}}/{{pod.name}}"` }}
And patch frontend split function, frontend/server/workflow-helper.ts:
function workflowNameFromPodName(podName: string) {
return podName.replace(/-system-container-impl-.*/, '')
}
And don't forget configure you kubeflow proxy frontend kubectl -n kubeflow describe po ml-pipeline-ui
ARGO_ARCHIVE_LOGS: true
ARGO_ARCHIVE_ARTIFACTORY: s3
ARGO_ARCHIVE_PREFIX: pod/log
ARGO_ARCHIVE_BUCKETNAME
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_SSL
AWS_S3_ENDPOINT
AWS_PORT
this tested in 2.2.0, in 2.1.0 probable should work too.
We're hoping to take this fix on this sprint.
My team and I are making good progress. We have a functional POC. It just needs some polish. It involves sending the run execution date to the logging handler since the current AWF configuration stores the log artifacts in date-based subfolders (to reduce the likelihood of collisions (see https://github.com/kubeflow/pipelines/pull/5894)). The existing code is more or less fine, it's just looking for the logs in the wrong location. The UX for this will be a bit different than v1. Logs will not be listed under output artifacts. Instead, when the logs tab is loaded, if the k8s pod is not accessible, it will populate the react component from the artifact store instead.
@droctothorpe however you do it, make sure that you don't assume anything about the structure of the Argo Workflows s3.keyFormat
as that is not guaranteed to be the same in all environments.
For example, some users have changed their key format to a different prefix or path structure which may or may not be the same as the current default in Kubeflow Pipelines.
That's why I said in https://github.com/kubeflow/pipelines/issues/10036#issuecomment-2164166923 that you have to get the true path from the workflow status.
@thesuperzapper, if I understand the logic correctly, the handler does in fact try to get the log artifact path from the workflow status first (if the k8s pods have been garbage collected). It only tries to call the artifact store directly (in a way that currently makes incorrect assumptions about where the log is located) if the workflow has been garbage collected.
The logic tries to retrieve the logs from multiple sources in the following sequence:
archiveLogs
is true (this is set in config.ts), then and only then does it try to retrieve the logs directly from the artifact store without pathing through the workflow CR. Currently, master makes incorrect assumptions about the path (here is where it is constructed) that are at odds with the keyFormat default in the workflow-controller-configmap
object.@sanchesoon's suggestion essentially reverts the log location, but https://github.com/kubeflow/pipelines/pull/5894 and the comments in the configmap convey that dates were deliberately added to the path to reduce the likelihood of collisions.
Our fix essentially corrects the assumption and is only applicable if the workflow has been garbage collected.
If a user both modifies the keyFormat and deletes the workflow CR, option 3 will not work. Why would a user want to modify the keyFormat though? Is that a rare enough edge case that solving for it would constitute bike-shedding? If not, there may be a way to have the handler call the K8s API to lookup the workflow-controller-configmap
, extract the keyFormat
and translate it somehow in a dynamic way that the function could parse, but that would (a) add latency, and (b) probably be a PITA, especially figuring out how to map the Go templating syntax to specific parameters in the function in a way that properly accounts for the sundry variants in the template tags. Another option is to store the path in MLMD. That would dramatically broaden the scope of the PR though and may receive pushback because it entails storing extremely Argo-specific data in the persistent data store. Another option is adding something similar to keyFormat to config.ts
so that it can be controlled dynamically using env vars which can be injected directly from the chart.
Relevant meme:
@droctothorpe
If a user both modifies the keyFormat and deletes the workflow CR, option 3 will not work.
If I understand correctly, this is only the case until the information about where the log artifact is stored is saved to the pipeline run artifacts, right? I take that this is the assumption because it wouldn't be too good to have a requirement of storing the Argo WF CR Objects on the etcd indefinitely. And if this assumption is right, we can use a very simple K8s mechanism to prevent users from manually deleting the WF CR objects before the information about where is the log artifact is stored to Pipeline Run Artifacts - finalizers.
If we were to use finalizers, we can just make the assumption that we control when exactly the Argo WF CR is released to be GCed and simplify the scenario a bit.
Why would a user want to modify the keyFormat though? Is that a rare enough edge case that solving for it would constitute bike-shedding?
There can be some requirement or limitation where you'd have to reuse the same S3 Bucket for multiple purposes (like multiple KF Instances or a general bucket for logs from multiple sources) because of how a company/organization manages the infrastructure (so in this example you're given an S3 Bucket with a specific prefix where you can save the logs artifacts, lets say /artifacts/kubeflow/{{workflow.namespace}}/pod/log/{{workflow.name}}/{{pod.name}}
.
Another example would be if each KF Profile is serving a different team and you're hoping to achieve full Profile isolation where Team A doesn't have any form of access to the logs of Team B. This is something similar to what @thesuperzapper pointed out in the https://github.com/kubeflow/pipelines/issues/10036#issuecomment-2164166923: artifacts/team-1/pipeline-1-8g5bs/2024/06/13/pipeline-1-8g5bs-3160443754/main.log
.
Also, in enterprise environment, you're rarely given the full freedom over your infrastructure and it would be a real issue if something as simple as the keyFormat
would be a blocker in adopting KF.
@kromanow94 Thanks for the context!
I take that this is the assumption because it wouldn't be too good to have a requirement of storing the Argo WF CR Objects on the etcd indefinitely.
Correct. Workflows gets GCed after a specified length of time as defined by the TTLStrategy field. Even after workflows are deleted, the corresponding logs persist in the artifact store.
Any objections to letting end users specify the typescript equivalent of an f-string through an env var via config.ts
? We could document some predetermined variable names that map to what's available in the corresponding handler (e.g. namespace, pod name, date, etc.), and add a comment to the keyFormat
field stating that if it's modified, the corresponding field in config.ts
should be updated.
The alternative is making the logs explicit output artifacts in MLMD. I would be curious to hear from Google about why that was disabled in v2 before reverting it. Output artifacts are first class citizens with dedicated nodes in the v2 graphs; we probably don't want every component to have a corresponding log node in the DAG, so we'd have to handle that in the graph rendering logic as well.
@droctothorpe
Any objections to letting end users specify the typescript equivalent of an f-string through an env var via config.ts?
More a doubt than an objection - maybe going for the env var via config.ts
is just not required at all? If we were to use the finalizer pattern, we assume that it's not required to reverse engineer where the logs might be stored. I think the information about the log URI location should be an integral part of the KFP Step and once that step is finished, we should only reference the KFP Step to get the information where the log artifact is stored.
But, maybe there is something obvious I don't see that conflicts with this approach. At end, what I think makes sense is to somehow save the address of the logs and put it together with the Step Execution. If this can be stored again in the MLMD, great. If not, is there any other place where we can store it? Maybe this is the part I don't yet fully understand - if the log artifact URI is not saved in the MLMD, do we have any other place where we can store that information, or currently we're forced to guess where the logs might be through the config.ts
?
if the log artifact URI is not saved in the MLMD, do we have any other place where we can store that information, or currently we're forced to guess where the logs might be through the config.ts?
Currently, the log artifact URI is not saved in MLMD. The UI is just guessing and guessing incorrectly, heh. If we store it as an output artifact in MLMD, that removes any guess work but substantially broadens the scope of the changes and will probably require design review from Google, which is doable ofc, it just delays things.
@droctothorpe jfc...
Another argument in favor of storing the log as an output artifact is what if an admin elects to update the keyFormat? The logs for historical runs will no longer be accessible, and there won't be a way for the UI to support both the old and new patterns at once.
My personal preference for maturity and stability would be to go with the MLMD and google review or an alternative store if possible. I'd even investigate if creating additional table in the mlpipeline db makes sense just to store the log artifacts URIs for given Step ID. Considering all the discussion, I don't feel competent enough to suggest if the workaround with the env var in the config.ts
is suitable for now... If I were to make this decision myself, I wouldn't do it.
I'm starting to lean in that direction as well, but what the heck do I know, I am just a ~plebe~ user.
Do ya'll think it make sense to submit a smaller, discrete PR that corrects the UI's incorrect guess for the time being until we can implement a broader fix that adds the log URI to MLMD? We can aim to submit the PR tomorrow (at AWS Summit all day today) to give you a sense of the scope.
If this can be added to the KF 1.9 then maybe it makes sense to unblock this with the idea you've proposed with config.ts
. If it's not going to be a part of the v1.9 release because how short on time we are, maybe not so much.
I just want to highlight what I said in https://github.com/kubeflow/pipelines/issues/10036#issuecomment-2164166923 again.
This is very clearly an accidental regression, because KFP V1 stores the logs in an MLMD artifact (it just did not automatically forward the logs tab to the appropriate artifacts and required users to click the step and view it).
The only additional complexity (which is probably why this was overlooked) is that in KFP v2 now each step is more than one container, each with its own logs.
Assuming we store the logs for all steps as an artifact (even the non-executor steps) we should correctly show them separately in the logs tab probably with some kind of sub-tab for each container.
@chensun @zijianjoy @connor-mccarthy Do you have any objections to adding the driver and executor logs to the list of output artifacts for each component? I just want to make sure that removing that functionality was not a deliberate design choice before we re-implement it in v2. Thank you!
I think the executor outputs are being extracted from the IR, and currently the IR doesn't include information about the container logs. We need to decide if we want to include the container logs as executor outputs, which is quite complicated because (a) the path is dynamically inferred by keyFormat
(for AWF) and not available client-side at IR compilation time, and (b) is extremely AWF specific in a way that could undermine workflow orchestration agnosticism in the IR.
POC for the interim solution: https://github.com/kubeflow/pipelines/pull/11010. This will make it possible for anyone to implement a solution in a way that accommodates their unique keyFormat
syntax without having to modify the actual code.
I plan to join the community call tomorrow to hear from Google about the proposed target-state solution of storing logs as output artifacts.
https://github.com/kubeflow/pipelines/pull/11010 was just merged 🎉 . It might make make sense to keep this issue open to continue discussion about wether or not to / how to surface driver and executor logs as explicit outputs in the GUI.
Hey, I noticed the PR is already included in the 2.3.0
. I'm trying to run some tests but I get the following error and I'm not able to get the logs from the Pod Step. I'm having two issues:
The ml-pipeline-ui
Pod is exiting the main process with the following error:
/server/node_modules/node-fetch/lib/index.js:1491
reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err));
^
FetchError: request to http://metadata/computeMetadata/v1/instance/attributes/cluster-name failed, reason: getaddrinfo ENOTFOUND metadata
at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11)
at ClientRequest.emit (node:events:517:28)
at Socket.socketErrorListener (node:_http_client:501:9)
at Socket.emit (node:events:517:28)
at emitErrorNT (node:internal/streams/destroy:151:8)
at emitErrorCloseNT (node:internal/streams/destroy:116:3)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
type: 'system',
errno: 'ENOTFOUND',
code: 'ENOTFOUND'
}
In the short period of time when the ml-pipeline-ui
is responsive before it's exiting with the above mentioned error code, I'm getting this error:
Could not get main container logs: Error: Unable to retrieve workflow status: [object Object].
Do you know what might be causing the issue?
FYI, I was testing with kubeflow/manifests 1.9.0 and only changed the container images tags for pipeline components to 2.3.0
.
@kromanow94, someone in the (deprecated) Kubeflow slack workspace recently DMed me reporting what looks like a very similar error and a corresponding solution:
Could not get main container logs: Error: Unable to retrieve workflow status: [object Object].
That error is raised here. It is raised when the workflow corresponding to the run has been garbage collected.
When that fails, it should move on to trying to retrieve the logs directly from the artifact store. That sequencing (K8s API lookup -> WF manifest lookup -> artifact store lookup) is defined here. Note that the third step only executes if archiveLogs
is true.
Thanks, I solved the issue with the ml-pipeline-ui
with the following workaround:
$ kubectl -n kubeflow expose deployment ml-pipeline --name metadata --dry-run=client -oyaml | yq 'del(.metadata.labels)' -y | kubectl apply -f -
It seems that I forgot about the env ARGO_ARCHIVE_LOGS="true"
as well.
Now with those two issues resolved, there is some other issue I'm having right now. It seems the frontend uses the workflow pod name in place of the workflow name to get the logs.
The Argo Workflows and ml-pipeline-ui are set with the following key format:
artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
Now, I'm trying a scenario when I manually delete just one Workflow Pod and see if I can get the logs. I'm running that with curl with the following command and error:
$ curl "https://kubeflow.example.com/pipeline/k8s/pod/logs?podname=addition-pipeline-pb72t-3709849268&runid=62911214-2e5c-46b4-ae26-d660809b8a5b&podnamespace=glo-k8s-admin&createdat=2024-09-10" -H "Authorization: Bearer $(kubectl -n glo-k8s-admin create token default-editor)"
Could not get main container logs: S3Error: 503
Looking at logs from the ml-pipeline-ui
, I see the following line:
GET /pipeline/k8s/pod/logs?podname=addition-pipeline-pb72t-3709849268&runid=62911214-2e5c-46b4-ae26-d660809b8a5b&podnamespace=glo-k8s-admin&createdat=2024-09-10
Getting logs for pod, addition-pipeline-pb72t-3709849268, from mlpipeline/artifacts/addition-pipeline-pb72t-3709849268/2024/09/10/addition-pipeline-pb72t-3709849268/main.log.
But, having a look at the storage in minio, the path seems to be different in the workflow name:
mlpipeline/artifacts/addition-pipeline-pb72t/2024/09/10/addition-pipeline-pb72t-3709849268/main.log
I also run the same test when both the ArgoWF and its Pods were deleted with the same result.
I also run another scenario with one of the default pipelines. Using the Run of [Tutorial] DSL - Control structures
will create a set of Pods with the following names:
tutorial-control-flows-gbjpn-system-dag-driver-1941995042
tutorial-control-flows-gbjpn-system-dag-driver-2529298331
tutorial-control-flows-gbjpn-system-container-driver-3821465773
tutorial-control-flows-gbjpn-system-container-driver-467613593
tutorial-control-flows-gbjpn-system-container-impl-2018360503
tutorial-control-flows-gbjpn-system-dag-driver-3793006146
tutorial-control-flows-gbjpn-system-container-driver-1781539373
tutorial-control-flows-gbjpn-system-dag-driver-1653906037
tutorial-control-flows-gbjpn-system-dag-driver-477249553
tutorial-control-flows-gbjpn-system-dag-driver-3564565339
tutorial-control-flows-gbjpn-system-dag-driver-511401279
tutorial-control-flows-gbjpn-system-container-impl-1647980155
tutorial-control-flows-gbjpn-system-container-driver-133111245
tutorial-control-flows-gbjpn-system-container-impl-902191899
tutorial-control-flows-gbjpn-system-dag-driver-2964226956
tutorial-control-flows-gbjpn-system-dag-driver-820485803
tutorial-control-flows-gbjpn-system-container-driver-2311769198
tutorial-control-flows-gbjpn-system-container-impl-3092937256
With the pipeline above, I confirm the mechanism works and I was able to get to the logs through the artifacts when the Pod was no longer available on the cluster.
For my other test I used pipeline created with the following script:
from kfp import dsl
import kfp
client = kfp.Client()
experiment_name = "my-experiment"
experiment_namespace = "glo-k8s-admin"
@dsl.component
def add(a: float, b: float) -> float:
"""Calculates sum of two arguments"""
return a + b
@dsl.pipeline(
name="Addition pipeline",
description="An example pipeline that performs addition calculations.",
)
def add_pipeline(
a: float = 1.0,
b: float = 7.0,
):
first_add_task = add(a=a, b=4.0)
second_add_task = add(a=first_add_task.output, b=b)
try:
print("getting experiment...")
experiment = client.get_experiment(
experiment_name=experiment_name, namespace=experiment_namespace
)
print("got experiment!")
except Exception:
print("creating experiment...")
experiment = client.create_experiment(
name=experiment_name, namespace=experiment_namespace
)
print("created experiment!")
client.create_run_from_pipeline_func(
add_pipeline,
arguments={"a": 7.0, "b": 8.0},
experiment_id=experiment.experiment_id,
enable_caching=False,
)
Which creates Pods with following names (the name differs from example in previous comment because it's a new run):
addition-pipeline-8fbx5-1329381822
addition-pipeline-8fbx5-2476489665
addition-pipeline-8fbx5-2909346888
addition-pipeline-8fbx5-579177391
addition-pipeline-8fbx5-822281223
If I understand correctly, this is because of how the Workflow Name is being provided by formatting the pod name:
https://github.com/kubeflow/pipelines/blob/2.3.0/frontend/server/workflow-helper.ts#L169
https://github.com/kubeflow/pipelines/blob/2.3.0/frontend/server/workflow-helper.ts#L214
Is this because of the difference in how the KFPv1 and KFPv2 works?
v2 pod names include suffixes like the first example you shared. Is the Addition pipeline
pipeline you linked authored / submitted with KFP v1? If so, the logs should be stored as proper output artifacts that you can access in the Inputs/Outputs
tab when you click on the node. That being said, it shouldn't be too difficult to add support for this alternative log retrieval mechanism in v1 if there is a demand for it. It's a long-standing issue in v1, v1 is in somewhat of a frozen state, and there's an alternative solution in place (the Inputs/Outputs
tab), so this PR didn't fix the logs tab in v1.
Also, in case it's related, please make note of this issue: https://github.com/kubeflow/pipelines/issues/11019.
That being said, continuing to do some extra validation on my end to see if I can repro the error.
Hey @droctothorpe , were you able to have a closer look here?
TBH, I'm not exactly sure if the Pipeline I created using the script from https://github.com/kubeflow/pipelines/issues/10036#issuecomment-2340206338 is KFP v1 or KFP v2. I deployed it using kfp==2.8.0
so I assumed it's KFP v2. Do you know if I should change anything in the script to explicitly configure it for KFP v2?
When I ran that same pipeline, my pods had different names (i.e. they adhered to the v2 nomenclature):
Are you positive that you submitted the run with v2?
Here are the logs in the UI for that pipeline even after the pods and workflow CR have been GCed:
Hey, I noticed the PR is already included in the
2.3.0
. I'm trying to run some tests but I get the following error and I'm not able to get the logs from the Pod Step. I'm having two issues:The
ml-pipeline-ui
Pod is exiting the main process with the following error:/server/node_modules/node-fetch/lib/index.js:1491 reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err)); ^ FetchError: request to http://metadata/computeMetadata/v1/instance/attributes/cluster-name failed, reason: getaddrinfo ENOTFOUND metadata at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11) at ClientRequest.emit (node:events:517:28) at Socket.socketErrorListener (node:_http_client:501:9) at Socket.emit (node:events:517:28) at emitErrorNT (node:internal/streams/destroy:151:8) at emitErrorCloseNT (node:internal/streams/destroy:116:3) at process.processTicksAndRejections (node:internal/process/task_queues:82:21) { type: 'system', errno: 'ENOTFOUND', code: 'ENOTFOUND' }
In the short period of time when the
ml-pipeline-ui
is responsive before it's exiting with the above mentioned error code, I'm getting this error:Could not get main container logs: Error: Unable to retrieve workflow status: [object Object].
Do you know what might be causing the issue?
FYI adding the following env var to the ml-pipeline-ui might be an easier fix for this error:
- name: DISABLE_GKE_METADATA
value: "true"
Shout out to @boarder7395 for sharing it.
@kromanow94 I think you might be having the same issue I did.
The two changes required for everything to work are
What I missed was the second item which resulted in the issues you had above.
Hey, I was able to get the main.log
artifact after the Workflow and Pods were deleted with latest version from kubeflow/manifests
v1.9.1
which uses the Kubeflow Pipelines version 2.3.0
.
I only had to put those two env variables for the ml-pipeline-ui
Deployment and all works:
- name: ARGO_ARCHIVE_LOGS
value: "true"
- name: DISABLE_GKE_METADATA
value: "true"
Though still a workaround in code, it works.
Are there any plans for the proper solution that registers the main.log
as artifact?
Feature Area
/area frontend /area backend
What feature would you like to see?
A feature similar to v1 behavior. In the details page of each step, the log appears as one of the artifacts that users can view:
In v2, this main log artifact is no longer displayed. It would be great if we could add a similar section to show the logs, or use the log artifact as a source for the "logs" panel after the pod is deleted, instead of directly pulling from the (already deleted) pod.
What is the use case or pain point?
Because all the logs of completed pods are auto deleted by Kubernetes after 24 hours, the users can no longer access the logs of earlier pipeline runs.
Is there a workaround currently?
No workaround.
/cc @kromanow94
Love this idea? Give it a 👍.