spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k stars 2.39k forks source link

`h5py` compatibility with `S3Target` ? #3262

Closed EthanMarx closed 8 months ago

EthanMarx commented 10 months ago

I am wondering if there's compatibility between S3Target and h5py files.

I wan't to be able to do something like

import luigi

class Task(luigi.Task):

    @property
    def client(self):
        return luigi.contrib.s3.S3client(endpoint_url = "https://endpoint")

    def output(self):
        return luigi.contrib.s3.S3Target("s3://bucket/file.h5", client=client)

    def run(self):

        with self.output.open("w") as f:
            # f is h5py.File like object 
            f.create_dataset("data", data=np.zeros(10))

I have been looking into the S3Target format argument, but it doesn't appear there is h5 support. Is there another solution to get behavior like this?

Thank you!

tgy commented 9 months ago

hi @EthanMarx check this out

import luigi
import h5py
import luigi.contrib
import luigi.contrib.s3
import numpy as np
from luigi.format import BaseWrapper, WrappedFormat

class BytesFormat(WrappedFormat):
    input = "bytes"
    output = "bytes"
    wrapper_cls = BaseWrapper

Bytes = BytesFormat()

class Task(luigi.Task):
    @property
    def client(self):
        return luigi.contrib.s3.S3Client(endpoint_url="https://endpoint")

    def output(self):
        extra_args = {"ContentType": "application/octet-stream"}
        return luigi.contrib.s3.S3Target(
            "s3://bucket/file.h5",
            client=self.client,
            extra_args=extra_args,
            format=Bytes,
        )

    def run(self):
        with self.output.open("w") as f:
            with h5py.File(f, 'a') as h5file:
                h5file.create_dataset("data", data=np.zeros(10))

the S3Target assumes the format to be Text by default (not great I know)

EthanMarx commented 8 months ago

@tgy Exactly what I was looking for I really appreciate the help - closing