pangeo-forge / pangeo-forge-runner

Run pangeo-forge recipes on Apache Beam
https://pangeo-forge-runner.readthedocs.io
Apache License 2.0
8 stars 9 forks source link

Add support for running local recipes during development #43

Open derekocallaghan opened 1 year ago

derekocallaghan commented 1 year ago

This is motivated by the discussion about potentially using pangeo-forge-runner as a CLI for validating recipes during development, prior to staged-recipes contribution.

Looking through the code, a temporary directory is created to contain the specified repo, which is then fetched by one of the content providers:

https://github.com/pangeo-forge/pangeo-forge-runner/blob/efd4e447dae551b150ef209041ec7b44a22c99b4/pangeo_forge_runner/commands/bake.py#L114

        # Create a temporary directory where we fetch the feedstock repo and perform all operations
        # FIXME: Support running this on an already existing repository, so users can run it
        # as they develop their feedstock
        with tempfile.TemporaryDirectory() as d:
            self.fetch(d)

The local content provider is first in the list of providers used during fetching:

https://github.com/pangeo-forge/pangeo-forge-runner/blob/1aa65854704e3f33038c130a19cd554fb1e86255/pangeo_forge_runner/commands/base.py#L81

    # Content providers from repo2docker are *solely* used to check out a repo
    # and get their contents locally, so we can work on them.
    content_providers = List(
        None,
        [
            contentproviders.Local,
            contentproviders.Zenodo,
            contentproviders.Figshare,
            contentproviders.Dataverse,
            contentproviders.Hydroshare,
            contentproviders.Swhid,
            contentproviders.Mercurial,
            contentproviders.Git,
        ],

...

        for ContentProvider in self.content_providers:
            cp = ContentProvider()
            spec = cp.detect(self.repo, ref=self.ref)
            if spec is not None:
                picked_content_provider = cp

...

        for log_line in picked_content_provider.fetch(
            spec, target_path, yield_output=True
        ):
            self.log.info(log_line, extra=dict(status="fetching"))

However, contentproviders.Local.fetch() expects output_dir == spec["path"]:

    def fetch(self, spec, output_dir, yield_output=False):
        # nothing to be done if your content is already in the output directory
        msg = f'Local content provider assumes {spec["path"]} == {output_dir}'
        assert output_dir == spec["path"], msg
        yield f'Using local repo {spec["path"]}.\n'

As output_dir will be the Bake temp dir, the local repo can't be used.

I was originally thinking that perhaps a quick check to see if self.repo exists prior to creating the temp dir might work. However, reading the following discussions suggests that fetching may be performed separately in future, so I'm not sure what's the best approach or if this issue will still be relevant:

derekocallaghan commented 1 year ago

Looks like this will be fixed by #44, I'll try it out locally.