Open lewisfogden opened 4 months ago
ClearAndOptimize
method, so it might not be an instance of same class as the non-graph stuff, maybe related through some abstract base class at best. There are certain tradeoffs such that t-2
(allows control flow to be more dynamic, no pre-run) and graph
(longer range dependencies) can both be offered and both provide value.The special method means it can't be of the same class I believe. I propose a minimal interface be agreed upon for properties and methods, like this -
interface of the model
def RunModel(self, proj_len: int):
def Clear(self):
@property
def cache(self):
@property
def cache_agg(self):
@property
def df(self):
@property
def df_agg(self):
interface of the cached method
@property
def cache(self):
@property
def cache_agg(self):
@property
def df(self):
@property
def df_agg(self):
You have keys
values
sum
on your method are all calling the equivalent methods on the cache. Proposal to nuke it and make people use the cache.
It is probably more memory efficient like 2-10x and handles the long range dependencies, so maybe don't delete it, but maybe the t-2
is just so much simpler that it be recommended. And then the graph is relegated to some advanced usage section or something idk, and t-2
trick is featured more prominently.
cache_agg
and df_agg
even with the base model2/6. My understanding is that avoiding the init is not idiomatic Python. Need to understand why avoiding it is desirable other than saving a few lines of code. PyTorch for example makes you do the super() on nn.Module. My disagreement with your approach is I don't know it to be widely in use, and it breaks typing. would like evidence that your approach is a common practice in Python libraries. a. I will be incredibly hard to convert on this because your approach breaks typing.
Update: My previous thoughts under heading "certain disagreement", no longer disagreement on this issue. I will update my code to match the init behavior of the base Model
.
What you are doing seems to be similar to what https://docs.kidger.site/equinox/api/module/module/ does. And they take the same approach to type hinting you have described previously.
Defined default behaviour with option to override sounds good, if it mostly calls other methods (e.g. set up cache / run etc) then users can easily override.
I've been working on a model dependency viewer (demo: https://lewisfogden.github.io/heavylight/experiments/term_assurance_graph.html)
I found that this was easy to code when I defined the data/basis items as class instances (I made them both dataclasses), which also helps with type inference.
E.g.
class Data:
annual_premium: np.ndarray
class UserModel(Model):
data: Data
I could probably generate the graph as well, I use a stack to track dependencies between items. I'm trying to understand what you mean by the model dependencies being easier to generate when Data
is a dataclass, I hope the input data isn't related to graph generation?
I usually take any dataframes I use and put them in a class that makes all the columns into numpy arrays. Usually looks like
class Modelpoints(df: pd.DataFrame):
self.pols_if = df["pols_if"].to_numpy()
... more of that ...
When Data is a dataclass (or a normal class like yours) - I am using the dis
and inspect
packages to read in the model and convert it into a graph - this is for documentation / diagram type view rather than for running the model. Using dis, it's just a little easier to reason about the fields in a dataclass, as a dictionary is defined at run-time and isn't structural.
At the moment though, I'm using dictionaries, to get from dataframe I do something like:
data = {col:df[col].values for col in df.columns}
I now see you are tracking data dependencies and that is why the data format is relevant.
Let me see if I can enumerate the changes that need to be made to have more of a single API.
keys
, values
, sum
, items
on methods.def __init__(self, *, do_run = None, proj_len:int = 0, **kwargs,):
items
on methodscache
propertyModel
+ cache_agg
df_agg
properties
Changes I'm thinking. @MatthewCaseres would like your views too
__init__
(i.e. Model behaviour rather than LightModel behaviour).backend
in Model, covering:t-2
simple aggregator that appliesagg_func
on all methods takingt
as a parameter.graph
Memory Optimisedt-2
andgraph
both take theagg_func
parameter, which defaults tonp.sum
graph
which requires a pre-run to optimise, this samples 50 points (say) from the dataset, but could pass aoptimiser_data
optional parameter?backend
allows us to add different optimisers in future (potentially even a compiler)The caller might look like this: