Closed logicbomb421 closed 3 years ago
/cc @kevinbache to investigate
Hello @logicbomb421 and sorry for the late response.
I compiled the python code you provided (pipeline.py
) and I noticed the following:
- container:
...
volumeMounts:
- mountPath: /secret/gcp-credentials
name: gcp-credentials-user-gcp-sa
- mountPath: '{{inputs.parameters.work-dir}}'
name: '{{inputs.parameters.create-scratch-volume-name}}'
...
name: process-batch
...
The problem is that {{inputs.parameters.create-scratch-volume-name}}
gets resolved to the PVC name. But it is not this which should be used. Instead, it is the volume name as referenced in the workflow spec (name
field of volumes
list) that must be used.
The name of the PVC resource means nothing to the workflow. It functions the same way StatefulSet
s do. volumes
is a list mapping cluster volume resources to keys (name
s), and to be able to use those resources we need to reference them through those keys.
I notice that data-split
template has it referenced the correct way.
- container:
...
volumeMounts:
- mountPath: /secret/gcp-credentials
name: gcp-credentials-user-gcp-sa
- mountPath: '{{inputs.parameters.work-dir}}'
name: create-scratch-volume
...
name: data-split
...
So let's debug pipeline.py
.
In looper()
function you seem to use the work_vol_name
as volume name.
p1.add_pvolumes({work_dir: k8s.V1Volume(name=work_vol_name)})
When you call that function you pass workdirop.outputs['name']
to work_vol_name
, which is an output parameter - the name of the created PVC, shouldn't be used as mentioned earlier.
You should mount the workdirop.volume
, the same way you do for data-split
:
split_op = data_split_op(
batch_size=batch_size,
work_dir=WORK_DIR,
data_filename=load_op.output
).apply(gcp.use_gcp_secret('user-gcp-sa')).add_pvolumes({WORK_DIR: workdirop.volume}).after(load_op)
So, I suggest to pass workdirop.volume
to looper()
function, too. (And maybe use work_vol
argument instead of work_vol_name
).
Another way would be to pass workdirop.name
, which essentially is the volume name used, but I wouldn't recommend it since that naming decision could change in the future.
Trying that, though, I get the following error:
ValueError: arguments to looper should be PipelineParams.
But I believe it isn't the desired behavior.
I get that a dsl.Condition
probably needs to contain PipelineParam
s, but I think that arguments to a graph component could be other types as well.
@Ark-kun, Hey Alexey, any thoughts on the last part?
@elikatsis thank you for the reply! I actually stated building this by passing workdirop.volume
to looper
but ran into the same error you encountered as well. I switched to passing the names since I could be sure they would be PipelineParams since they were the output of another operation.
Ideally however, passing the actual k8s resource created to looper
would be great.
Trying that, though, I get the following error:
ValueError: arguments to looper should be PipelineParams.
But I believe it isn't the desired behavior.
I get that a
dsl.Condition
probably needs to containPipelineParam
s, but I think that arguments to a graph component could be other types as well.@Ark-kun, Hey Alexey, any thoughts on the last part?
I agree with you. I'll need to check the implementation of @graph_component
some day. @gaoning777 Do you remember why this limitation exist?
I am getting the same error "invalid spec: templates.1000g-variant.tasks.for-loop-for-loop-7301aa19-1 failed to resolve {{inputs.parameters.ds-human-g1k-v37-name}}" in dsl.ParallelFor as well. By applying the workaround mentioned above, I was able to start a run. Here is the code segement with workaround:
with dsl.ParallelFor(readqueries()) as query:
with dsl.ParallelFor(REGIONS) as region:
# freebayes_task = \
freebayes_op(region=region,
query=query,
# https://github.com/kubeflow/pipelines/issues/1891
# workaround as .volume passes in pvc name incorrectly in for loops
# input_vol=human_g1k_v37.volume.after(samtools_task),
input_vol=k8s_client.models.V1Volume(name=human_g1k_v37.name),
# output_vol=vcf_output.volume,
output_vol=k8s_client.models.V1Volume(name=vcf_output.name),
host=oneclient_provider_host,
token=oneclient_access_token,
# .after() added to work around issue 1891
insecure=oneclient_insecure).after(samtools_task, vcf_output, human_g1k_v37)
The problem does not exist if outside of dsl.ParallelFor.
/assign @Ark-kun
/assign @numerology
/unassign @gaoning777
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
ValueError: arguments to looper should be PipelineParams.
Was a fix implemented for this Issue? I am also facing this error when passing a pvolume to a @graph_component
.
/reopen
@Bobgy: Reopened this issue.
It seems to me that we would just need to include PipelineVolume in dsl._component.py here
like:
from ._pipeline_volume import PipelineVolume
...
if not isinstance(input, (PipelineParam, PipelineVolume)):
raise ValueError('arguments to ' + func.__name__ + ' should be PipelineParams or PipelineVolumes.')
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Small Backstory: I am attempting to design a pipeline that processes a large dataset in batches. This dataset is processed in batches via the
batch_size
pipeline parameter. Once the data has been batched, I am attempting to use agraph_component
component to essentially loop over all batches and spawn processing nodes.Each of these processing nodes needs to have the working directory created by the initial
VolumeOp
mounted to it.What happened: When attempting to mount the
VolumeOp
-based volume intoContainerOp
s recursively spawned nodes, I receive the error:This step is in Error state with this message: volume '<pvc_that_was_just_created_by_volumeop>' not found in workflow spec
What did you expect to happen: The
VolumeOp
-based volume mounts to allContainerOp
containers, regardless of if they were spawned recursively or not.What steps did you take: I have tried various ways to make this work, from simply calling
add_pvolumes
on theContainerOp
s spawned recursively, to manually creating the k8s resources and appending them (.add_volume
,.container.add_volume_mount
), all the way to manually editing the generated YAML based on what I was reading regarding looping in the argo docs.Anything else you would like to add: Here is a self-contained example that should reproduce this (attempted to attach to issue but GH was choking on the upload)
Instructions
Dockerfile
Oneliner for
500.json
pipeline.py