simonsobs / sotodlib

Simons Observatory: Time-Ordered Data processing library.
MIT License
15 stars 18 forks source link

Revisit subpackage organization #53

Closed tskisner closed 3 years ago

tskisner commented 4 years ago

Recent changes have shown us that we should revisit the big-picture organization of sotodlib. @mhasself has floated the idea of bringing the sotoddb repo into this package under the meta subpackage. This issue is for tracking the summary of offline discussion and thoughts about this before we take any action.

kmharrington commented 4 years ago

A couple thoughts: I've noticed it looks like there's going to be a lot of toast-related functions and a lot of Axis-manager related functions. Also, things "needed for simulations" and things "needed for analysis of simulations/real data" are pretty interspersed. I'd be inclined to separate those into top-level modules as well.

And then there's some random stuff I wrote awhile ago that's probably just deprecated already.

tskisner commented 4 years ago

Yeah... @mhasself and I started to think about this offline, but are going to first work on defining the AxisManager / TOAST interface a little better. I think that will help inform the best way of organizing the sub packages.

tskisner commented 4 years ago

Ok, after some more big-picture discussion offline, I think we are going to simplify the package structure and a proposed draft of the new organization is:

sotodlib.core : Things like data container classes, metadata containers, etc.
sotodlib.scripts : Entrypoints (*.py files which define a "main()" and which get
        installed by setup.py)
sotodlib.io : Code for doing I/O.  Specifically this is code for reading and 
        writing on-disk data (G3 and auxilliary data) and any metadata DB access.
sotodlib.utils : Code for helper operations and any data format conversions
        if needed.
sotodlib.* : "Analysis" tools that work with data can go in files at the top level of
        the package, at least until there are too many of them that it makes sense
        to create subdirectories.

Just a note that this plan involves more than just moving source files around. For example the functionality of the sotoddb package would get split into the "container" definition and the "I/O" (DB access). Also existing tools like the hardware layout simulation would get turned into a module at the top level that simulates the hardware and populates a "metadata container".

Because these changes involve several steps, I suggest we keep this issue as the "meta" issue discussing the big picture and then open smaller issues to track the intermediate steps. I'll start that process shortly.

tskisner commented 4 years ago

Just a note that although we are starting with the all the code that "does stuff" in the root of the package, we do still need to have some policies about interdependencies, which is what I think @kmharrington was getting at earlier in this thread. Any *.py file in the root of the package can safely import from the subpackages like:

from .core import AxisManager
from .utils import OtherUsefulClass

But we must avoid circular dependencies between these top-level *.py files. Hopefully documenting that policy combined with catching things during review of PRs will be sufficient.

tskisner commented 4 years ago

Revisiting this issue now that there are tod_ops and modules directories... I have a proposal at the bottom of this post for those in a hurry :-) I'll try to summarize multiple previous discussions in this issue. I think the conclusion was that there will be the following categories:

Right now, we seem to be pushing AxisManager operations into tod_ops and TOAST operations into modules. And then we have several other things at the top level of the package. Rather than split code by people who are working on things, I think it would make our software more cohesive if we could split by type of operations. Here is a concrete proposal for consideration:

Thoughts / feedback are welcome, especially from @mhasself, @kmharrington and @keskitalo

kmharrington commented 4 years ago

I wonder about mixing the TOAST operators with the AxisManager ones. At some level, it's nice to load a package and know that all the functions in that package work on your data model. Even if we keep subpackages "all TOAST" or "all AxisManagers" you'll have to know which ones use.

Where do we want the tree to be?

Versus

tskisner commented 4 years ago

So if I have some common math functions used by operators in both frameworks, where would that go? If we have the sim and ops packages at the top level, then it would clearly go in the ops directory close to both of the operators that use it. Will we ever have a toast operator that calls axismanager ops? If so, where would that go, since it uses both frameworks. Currently we will have that situation in at least one place, but that will be in the io subpackage, so not a great example.

If we must "silo" the code in this way, then my preference is ops.toast and ops.aman. Then at least the common math functions independent of framework can go in the ops package. Same with any operators that use both frameworks.

So far, most of the functionality in the current tod_ops package is pretty trivial, so likely I will just port those to toast operators directly. However, at some point there may be more complicated processing functions shared between the two data models and I want to make sure there is a place for them.

mhasself commented 4 years ago

I tend to agree with @kmharrington on this. People will be working with TOAST representation, or axisman representation. The functions compatible with a particular TOD representation should be close together. Functions that aren't tied specifically to a representation should be elsewhere (utils?) so that they are available to either framework.

I think we might spend next week's PWG telecon talking about this.

tskisner commented 4 years ago

Ok, that's fine- but @kmharrington put forth two options for organization. I like the option with a top-level "category" (sim and ops) so that at least we can ignore simulation stuff when we are working with data from disk.

tskisner commented 3 years ago

closed by #96