xdas-dev / xdas

Python framework for Distributed Acoustic Sensing (DAS).
https://xdas.readthedocs.io
GNU General Public License v3.0
29 stars 2 forks source link

Sequence composition with Atoms #1

Closed martijnende closed 7 months ago

martijnende commented 8 months ago

Outline

This PR adds a framework for composing complex data processing pipelines by chaining elementary operations. The motivation for introducing composability, is that experienced DAS analysts have already developed their preferred data analysis workflows, and are not likely to adopt new end-to-end workflows over which they have no control. So, instead of providing complex operations with no room for customisation, xdas.Sequence offers a framework for chaining together basic operations (xdas.Atoms) in a user-specified order and with dedicated function arguments. This allows for enhanced optimisation at the level of individual atoms, as well as at the level of the entire pipeline, while the users retain the same flexibility as when creating the pipeline themselves.

The new Sequence objects aims at replacing the old ProcessingChain one.

Usage

In:

import xdas.signal as xp
from xdas import Atom, Sequence

sequence = Sequence(
    [
        Atom(xp.taper, dim="time"),
        Atom(xp.taper, dim="distance"),
    ]
)
print(sequence)
sequence(db)

Out:

Sequence:
  0: taper(..., dim=time)
  1: taper(..., dim=distance)

TODO

martijnende commented 8 months ago

Why not sub-classing Sequence from list instead of dict?

The main difference would be the loss of descriptive keywords, and perform operations by selecting keywords. Since these keywords do not necessarily depend on the position of a given atom in the sequence, you could define modifications of a sequence in a more reusable way. Using indices rather than keywords makes the bookkeeping a lot simpler (no need for unique naming and duplicate checking). So we could reconsider this trade-off between "selectability" and code complexity.

The Keras sequences are not meant to be stored as recipes, which was one of the initial motivations for the xdas composability (from xdas.recipes import fk giving you a predefined sequence). Users might want to modify a pre-defined recipe to suit their needs, which is where the sequence manipulations come in. Defining an order only at declaration time prevents user modifications.

Would it make sense that Atom and StateAtom subclass partial?

What would we gain from subclassing partial?

Would it make sense to have nested Sequences?

Maybe this would make sense for output handling: one Sequence generates one output, so if you want intermediate outputs you'd need to define multiple sequences, each of which are placed in a higher-level sequence. If not for the output, it would make no difference if sequences are nested or concatenated.

atrabattoni commented 8 months ago

Maybe this would make sense for output handling: one Sequence generates one output, so if you want intermediate outputs you'd need to define multiple sequences, each of which are placed in a higher-level sequence. If not for the output, it would make no difference if sequences are nested or concatenated.

I was more thinking In a way of organizing the sequence in sub-parts. Like if you apply a FK then some other function, rather to have a very long list of atom, those would be organized. In other term an item of the sequence can be a sequence itself.

atrabattoni commented 8 months ago

The Keras sequences are not meant to be stored as recipes, which was one of the initial motivations for the xdas composability (from xdas.recipes import fk giving you a predefined sequence). Users might want to modify a pre-defined recipe to suit their needs, which is where the sequence manipulations come in. Defining an order only at declaration time prevents user modifications

Yeah well I suspect that modifying a sequence will require as much line of codes than redefining it. As a user I would inspect the recipe, either use it as is or re-declare everything.

Well I still don't have in mind a use case for a very long recipes. A user will probably need to change many of its parameters.

atrabattoni commented 8 months ago

I mean I do not see what we need more than in the keras case. People copy paste others models and change a little bit some parameters.

atrabattoni commented 8 months ago

Its a matter of editing some code vs programmatically modify some objects. The first approach requires less work for us and look more straightforward. Keras also provide a way to store weights (as we would save state). So it looks like a good source of inspiration for me. But I'm ready to ear your point of view.

martijnende commented 8 months ago

In terms of execution, nested Sequences and Atoms should have a similar behaviour (after implementing the __call__ method): db <- obj.__call__(db). The recursive execution is handled automatically. The only thing that would need to change, is managing the Sequence and sub-Sequence ordering, which pertains to the discussion of the Keras-style sequence.

Very long sequences would almost certainly be exclusively user-defined (we could probably quantify this probability based on the increasing entropy of longer sequences). So manipulating the sequence would only be useful for the xdas recipes. Maybe the choice of sequence mutability would depend on the kind of recipes we plan to provide?

atrabattoni commented 8 months ago

Maybe the choice of sequence mutability would depend on the kind of recipes we plan to provide?

Yeah let's discuss what user case we want to cover and which kind of syntax we want next week. The implementation will naturally follow.