Closed galenseilis closed 10 months ago
I'm really excited by skdag. I was working on a similar project when I realized that it solved all the problems I had or wanted to solve.
I am currently trying to build something on top of this which has access to the underyling dag structure. I've encountered an ambiguity I am hoping for technical assistance with.
Suppose I begin with this example from the docs loaded in memory:
from skdag import DAGBuilder from sklearn.compose import make_column_selector from sklearn.decomposition import PCA from sklearn.impute import SimpleImputer from sklearn.linear_model import LogisticRegression dag = ( DAGBuilder(infer_dataframe=True) .add_step("impute", SimpleImputer()) .add_step("vitals", "passthrough", deps={"impute": ["age", "sex", "bmi", "bp"]}) .add_step("blood", PCA(n_components=2, random_state=0), deps={"impute": make_column_selector("s[0-9]+")}) .add_step("lr", LogisticRegression(random_state=0), deps=["blood", "vitals"]) .make_dag() ) dag.show()
I noticed that there is both
dag.graph
anddag.graph_
stored in memory at different addresses. They seem highly-similar when inspecting the nodes and edges. Is one a reference to the other? Or is one a shallow copy of the other? Or is one a deep copy of the other? Or are they fundamentally different?
I found what might be a clue in the source.
graph_ : :class:networkx.DiGraph A read-only view of the workflow.
I'm really excited by skdag. I was working on a similar project when I realized that it solved all the problems I had or wanted to solve. I am currently trying to build something on top of this which has access to the underyling dag structure. I've encountered an ambiguity I am hoping for technical assistance with. Suppose I begin with this example from the docs loaded in memory:
from skdag import DAGBuilder from sklearn.compose import make_column_selector from sklearn.decomposition import PCA from sklearn.impute import SimpleImputer from sklearn.linear_model import LogisticRegression dag = ( DAGBuilder(infer_dataframe=True) .add_step("impute", SimpleImputer()) .add_step("vitals", "passthrough", deps={"impute": ["age", "sex", "bmi", "bp"]}) .add_step("blood", PCA(n_components=2, random_state=0), deps={"impute": make_column_selector("s[0-9]+")}) .add_step("lr", LogisticRegression(random_state=0), deps=["blood", "vitals"]) .make_dag() ) dag.show()
I noticed that there is both
dag.graph
anddag.graph_
stored in memory at different addresses. They seem highly-similar when inspecting the nodes and edges. Is one a reference to the other? Or is one a shallow copy of the other? Or is one a deep copy of the other? Or are they fundamentally different?I found what might be a clue in the source.
graph_ : :class:networkx.DiGraph A read-only view of the workflow.
Also
Only defined if all of the underlying root estimators in
graph_expose such an attribute when fit.
Really it's just an implementation detail to make sure the DAG object conforms with the sklearn API. In practice though, it's always good to use graph_
if you want to make use of the graph as the DAG itself does, and graph
if you want to see the original inputs that were provided to instantiate the object. At the moment there's no real difference, but it's possible that could change in the future if DAG were ever to modify the inputs in any way when you use it.
To be honest though, I think the whole thing needs to be revamped and simplified so I doubt there will be any changes to this current API until then, and from that point things might work very differently.
I'm really excited by skdag. I was working on a similar project when I realized that it solved all the problems I had or wanted to solve.
I am currently trying to build something on top of this which has access to the underyling dag structure. I've encountered an ambiguity I am hoping for technical assistance with.
Suppose I begin with this example from the docs loaded in memory:
I noticed that there is both
dag.graph
anddag.graph_
stored in memory at different addresses. They seem highly-similar when inspecting the nodes and edges. Is one a reference to the other? Or is one a shallow copy of the other? Or is one a deep copy of the other? Or are they fundamentally different?