Closed IanTayler closed 1 year ago
Hi @IanTayler, thank you for the report and apologies for the late reply.
Currently, you are right that the current implementation does not use volumes to achieve value-passing here. In current usage (and in usage on cloud-based runners), users set their pipeline root to be a storage bucket (say on GCS or S3). Then, subsequent writes to the intermediate files are accessible by all containers when run in a certain cloud project. We currently have a limitation that this scheme does not work as well for local execution, and have not yet implemented the local volume type approach.
Hi @charlesccychen, thank you very much for the response. Soledad here, I work on the same team that @IanTayler and he's OOO today.
Is the local volume implementation on your development roadmap? If that's the case, do you have an estimate of when this will be Implemented?
Thanks in advance!
For future reference, I was actually able to work around this by using a BeamDagRunner
and attaching a platform_config_pb2.DockerPlatformConfig()
to the components, setting the volumes
field to the local pipeline root (which should be an absolute path for it to work properly).
@IanTayler,
Could you please confirm if this issue can be closed since the workaround of using BeamDagRunner and setting volumes field to local pipeline works for you. Thank you!
The workaround works, although it's a bit hacky.
Whether that means this issue can be closed depends on whether the tfx
team thinks local pipeline roots should be implemented for LocalDagRunner
or not. On our side, we have a working alternative and would not mind too much either way.
Thank you for the response. We (still) don't have a good plan to improve LocalDagRunner integration with the container component support yet. Let me close the bug for now.
System information
poetry export --without-hashes
output. Not usingpip freeze
because it points to locally cached wheels when usingpoetry
):Describe the current behavior
Even though output artifacts are correctly written inside the container, these files are never writter to the pipeline root and are never passed to the subsequent components that need them.
After running the code I wrote below, I run
docker container ls --all
and can check that I had two container runs but the second one got an empty value for outputx
even though by doingdocker cp <first_container_hash>:/tmp/output/ first_out
I can check that my container is correctly writing the value inside the provided path in/tmp/output
.I have also tried changing the
InputValuePlaceholder
for theString
to anInputUriPlaceholder
(while changingecho
tocat
) and that makes the second container fail with aNo such file or directory
error message fromcat
.Of course, the original pipeline I got this problem with was more complicated than the one shown below. In that one, I also saw this problem with
tfx.types.experimental.simple_artifacts.Metrics
artifacts. These are the only two types of artifacts I tested theLocalDagRunner
with. I don't want to give the impression I saw it working for other artifact types.docker inspect
shows there are no volumes in the containers created by theLocalDagRunner
.Describe the expected behavior
The file written in the docker file to the output location should be copied to the host pipeline_root for inspection and passed to all components that need it. Probably the most reasonable way to get this would be by using volumes.
Standalone code to reproduce the issue
Other info / logs
Containers after running:
Value of output inside container (run number 7):
Let me know if there's any other information I can provide or any other way I can help with this issue.