Closed tskisner closed 3 years ago
A couple thoughts: I've noticed it looks like there's going to be a lot of toast-related functions and a lot of Axis-manager related functions. Also, things "needed for simulations" and things "needed for analysis of simulations/real data" are pretty interspersed. I'd be inclined to separate those into top-level modules as well.
And then there's some random stuff I wrote awhile ago that's probably just deprecated already.
Yeah... @mhasself and I started to think about this offline, but are going to first work on defining the AxisManager / TOAST interface a little better. I think that will help inform the best way of organizing the sub packages.
Ok, after some more big-picture discussion offline, I think we are going to simplify the package structure and a proposed draft of the new organization is:
sotodlib.core : Things like data container classes, metadata containers, etc.
sotodlib.scripts : Entrypoints (*.py files which define a "main()" and which get
installed by setup.py)
sotodlib.io : Code for doing I/O. Specifically this is code for reading and
writing on-disk data (G3 and auxilliary data) and any metadata DB access.
sotodlib.utils : Code for helper operations and any data format conversions
if needed.
sotodlib.* : "Analysis" tools that work with data can go in files at the top level of
the package, at least until there are too many of them that it makes sense
to create subdirectories.
Just a note that this plan involves more than just moving source files around. For example the functionality of the sotoddb package would get split into the "container" definition and the "I/O" (DB access). Also existing tools like the hardware layout simulation would get turned into a module at the top level that simulates the hardware and populates a "metadata container".
Because these changes involve several steps, I suggest we keep this issue as the "meta" issue discussing the big picture and then open smaller issues to track the intermediate steps. I'll start that process shortly.
Just a note that although we are starting with the all the code that "does stuff" in the root of the package, we do still need to have some policies about interdependencies, which is what I think @kmharrington was getting at earlier in this thread. Any *.py
file in the root of the package can safely import from the subpackages like:
from .core import AxisManager
from .utils import OtherUsefulClass
But we must avoid circular dependencies between these top-level *.py
files. Hopefully documenting that policy combined with catching things during review of PRs will be sufficient.
Revisiting this issue now that there are tod_ops
and modules
directories... I have a proposal at the bottom of this post for those in a hurry :-) I'll try to summarize multiple previous discussions in this issue. I think the conclusion was that there will be the following categories:
core
) Classes and functions for in-memory representation of data and metadata. This includes the Context stuff, the AxisManager data container, and the Hardware class for synthetic instrument properties (i.e. before we have a database of real instrument properties). Note: since the underlying sqlite DBs are tightly coupled to these classes, they live here rather than in "io", even though they are technically touching disk. Note that in the case of TOAST, the Observation
class is defined externally in the toast package, so we don't need to define it here (but this is where it would go if we did).io
) Classes and functions for reading and writing data. This includes both AxisManager loading and also TOAST export / load operators.utils
) Any auxiliary helper functions that do not logically go somewhere else. For example, when writing G3 frames from toast, there are some helper functions that live in io
, close to the export code, which makes sense. Currently we have some toast pipeline helper functions in here, but most of that will be built in to the operators themselves in toast3.scripts
) Entry points defined in setup.py. These are designed to be small stand-alone scripts that do specific things. Examples might be: script to query available local data products that can be loaded, scripts to manipulate synthetic instrument files, etc. These scripts should be focused on doing things directly relevant to the tools in sotodlib
. Large workflows should go somewhere else (like the pipelines
top level directory or in pwg-scripts
, etc).Right now, we seem to be pushing AxisManager operations into tod_ops
and TOAST operations into modules
. And then we have several other things at the top level of the package. Rather than split code by people who are working on things, I think it would make our software more cohesive if we could split by type of operations. Here is a concrete proposal for consideration:
sim
: Anything relating to simulations, including simulated hardware and any AxisManager or TOAST operations that simulate timestreams. If you just load (real or already-simulated) data off disk and do stuff with it, then you should not need to import this.ops
: Any AxisManager or TOAST operations for working with real or already-simulated data.sim
or ops
.ops
.Thoughts / feedback are welcome, especially from @mhasself, @kmharrington and @keskitalo
I wonder about mixing the TOAST operators with the AxisManager ones. At some level, it's nice to load a package and know that all the functions in that package work on your data model. Even if we keep subpackages "all TOAST" or "all AxisManagers" you'll have to know which ones use.
Where do we want the tree to be?
Versus
So if I have some common math functions used by operators in both frameworks, where would that go? If we have the sim
and ops
packages at the top level, then it would clearly go in the ops
directory close to both of the operators that use it. Will we ever have a toast operator that calls axismanager ops? If so, where would that go, since it uses both frameworks. Currently we will have that situation in at least one place, but that will be in the io
subpackage, so not a great example.
If we must "silo" the code in this way, then my preference is ops.toast
and ops.aman
. Then at least the common math functions independent of framework can go in the ops
package. Same with any operators that use both frameworks.
So far, most of the functionality in the current tod_ops package is pretty trivial, so likely I will just port those to toast operators directly. However, at some point there may be more complicated processing functions shared between the two data models and I want to make sure there is a place for them.
I tend to agree with @kmharrington on this. People will be working with TOAST representation, or axisman representation. The functions compatible with a particular TOD representation should be close together. Functions that aren't tied specifically to a representation should be elsewhere (utils?) so that they are available to either framework.
I think we might spend next week's PWG telecon talking about this.
Ok, that's fine- but @kmharrington put forth two options for organization. I like the option with a top-level "category" (sim
and ops
) so that at least we can ignore simulation stuff when we are working with data from disk.
closed by #96
Recent changes have shown us that we should revisit the big-picture organization of sotodlib. @mhasself has floated the idea of bringing the sotoddb repo into this package under the
meta
subpackage. This issue is for tracking the summary of offline discussion and thoughts about this before we take any action.