Revisit subpackage organization

tskisner commented 4 years ago

Recent changes have shown us that we should revisit the big-picture organization of sotodlib. @mhasself has floated the idea of bringing the sotoddb repo into this package under the meta subpackage. This issue is for tracking the summary of offline discussion and thoughts about this before we take any action.

kmharrington commented 4 years ago

A couple thoughts: I've noticed it looks like there's going to be a lot of toast-related functions and a lot of Axis-manager related functions. Also, things "needed for simulations" and things "needed for analysis of simulations/real data" are pretty interspersed. I'd be inclined to separate those into top-level modules as well.

And then there's some random stuff I wrote awhile ago that's probably just deprecated already.

tskisner commented 4 years ago

Yeah... @mhasself and I started to think about this offline, but are going to first work on defining the AxisManager / TOAST interface a little better. I think that will help inform the best way of organizing the sub packages.

tskisner commented 4 years ago

Ok, after some more big-picture discussion offline, I think we are going to simplify the package structure and a proposed draft of the new organization is:

sotodlib.core : Things like data container classes, metadata containers, etc.
sotodlib.scripts : Entrypoints (*.py files which define a "main()" and which get
        installed by setup.py)
sotodlib.io : Code for doing I/O.  Specifically this is code for reading and 
        writing on-disk data (G3 and auxilliary data) and any metadata DB access.
sotodlib.utils : Code for helper operations and any data format conversions
        if needed.
sotodlib.* : "Analysis" tools that work with data can go in files at the top level of
        the package, at least until there are too many of them that it makes sense
        to create subdirectories.

Just a note that this plan involves more than just moving source files around. For example the functionality of the sotoddb package would get split into the "container" definition and the "I/O" (DB access). Also existing tools like the hardware layout simulation would get turned into a module at the top level that simulates the hardware and populates a "metadata container".

Because these changes involve several steps, I suggest we keep this issue as the "meta" issue discussing the big picture and then open smaller issues to track the intermediate steps. I'll start that process shortly.

tskisner commented 4 years ago

Just a note that although we are starting with the all the code that "does stuff" in the root of the package, we do still need to have some policies about interdependencies, which is what I think @kmharrington was getting at earlier in this thread. Any *.py file in the root of the package can safely import from the subpackages like:

from .core import AxisManager
from .utils import OtherUsefulClass

But we must avoid circular dependencies between these top-level *.py files. Hopefully documenting that policy combined with catching things during review of PRs will be sufficient.

tskisner commented 4 years ago

Revisiting this issue now that there are tod_ops and modules directories... I have a proposal at the bottom of this post for those in a hurry :-) I'll try to summarize multiple previous discussions in this issue. I think the conclusion was that there will be the following categories:

(core) Classes and functions for in-memory representation of data and metadata. This includes the Context stuff, the AxisManager data container, and the Hardware class for synthetic instrument properties (i.e. before we have a database of real instrument properties). Note: since the underlying sqlite DBs are tightly coupled to these classes, they live here rather than in "io", even though they are technically touching disk. Note that in the case of TOAST, the Observation class is defined externally in the toast package, so we don't need to define it here (but this is where it would go if we did).
(io) Classes and functions for reading and writing data. This includes both AxisManager loading and also TOAST export / load operators.
(utils) Any auxiliary helper functions that do not logically go somewhere else. For example, when writing G3 frames from toast, there are some helper functions that live in io, close to the export code, which makes sense. Currently we have some toast pipeline helper functions in here, but most of that will be built in to the operators themselves in toast3.
(scripts) Entry points defined in setup.py. These are designed to be small stand-alone scripts that do specific things. Examples might be: script to query available local data products that can be loaded, scripts to manipulate synthetic instrument files, etc. These scripts should be focused on doing things directly relevant to the tools in sotodlib. Large workflows should go somewhere else (like the pipelines top level directory or in pwg-scripts, etc).
Other analysis tools. This includes both timestream processing tools and potentially tools that do things in the map domain. For timestream processing there will code working with both AxisManagers and toast observations.

Right now, we seem to be pushing AxisManager operations into tod_ops and TOAST operations into modules. And then we have several other things at the top level of the package. Rather than split code by people who are working on things, I think it would make our software more cohesive if we could split by type of operations. Here is a concrete proposal for consideration:

sim: Anything relating to simulations, including simulated hardware and any AxisManager or TOAST operations that simulate timestreams. If you just load (real or already-simulated) data off disk and do stuff with it, then you should not need to import this.
ops: Any AxisManager or TOAST operations for working with real or already-simulated data.
"top-level": Everyone seems to hate putting stuff in the top-level of the package, which is fine, but let's just clear it out and put code into sim or ops.
If we find that we want to put map-making related tools in sotodlib, we could make a new subpackage or just leave them in ops.

Thoughts / feedback are welcome, especially from @mhasself, @kmharrington and @keskitalo

kmharrington commented 4 years ago

I wonder about mixing the TOAST operators with the AxisManager ones. At some level, it's nice to load a package and know that all the functions in that package work on your data model. Even if we keep subpackages "all TOAST" or "all AxisManagers" you'll have to know which ones use.

Where do we want the tree to be?

ops
- ops.toast
- ops.aman
sim
- sim.toast
- sim.aman

Versus

toast
- toast.sim
- toast.ops
aman
- aman.sim
- aman.ops

tskisner commented 4 years ago

So if I have some common math functions used by operators in both frameworks, where would that go? If we have the sim and ops packages at the top level, then it would clearly go in the ops directory close to both of the operators that use it. Will we ever have a toast operator that calls axismanager ops? If so, where would that go, since it uses both frameworks. Currently we will have that situation in at least one place, but that will be in the io subpackage, so not a great example.

If we must "silo" the code in this way, then my preference is ops.toast and ops.aman. Then at least the common math functions independent of framework can go in the ops package. Same with any operators that use both frameworks.

So far, most of the functionality in the current tod_ops package is pretty trivial, so likely I will just port those to toast operators directly. However, at some point there may be more complicated processing functions shared between the two data models and I want to make sure there is a place for them.

mhasself commented 4 years ago

I tend to agree with @kmharrington on this. People will be working with TOAST representation, or axisman representation. The functions compatible with a particular TOD representation should be close together. Functions that aren't tied specifically to a representation should be elsewhere (utils?) so that they are available to either framework.

I think we might spend next week's PWG telecon talking about this.

tskisner commented 4 years ago

Ok, that's fine- but @kmharrington put forth two options for organization. I like the option with a top-level "category" (sim and ops) so that at least we can ignore simulation stuff when we are working with data from disk.

tskisner commented 3 years ago

closed by #96

simonsobs / sotodlib

Revisit subpackage organization #53