Closed PSSF23 closed 1 year ago
Looks good to me. Thanks a lot. I will see if the technical part can work out this way.
How do you feel about the partial_fit method in classes.py
? Can it be simplified too?
How do you feel about the partial_fit method in
classes.py
? Can it be simplified too?
Not as familiar w/ the API for partial_fit, but it looks pretty simple, so I would say it should be okay.
FYI @PSSF23 I've migrated the entire set of changes into the fork
branch, which will serve as the main branch for any PRs into the sklearn fork. I've added documentation regarding changes into the README file. Lmk if you need any assistance finishing this out.
Thanks @adam2392 ! I was focusing on icml reviews and will have more time to optimize the code.
Started working on the root initialization and method modulation. The original false_roots
was a dict
object, should I cdef dict self.initial_roots
or just object
?
My current attempt begins by updating self.splitter
with the new X
and y
, meanwhile storing the leaves in self.initial_roots
for builders to use in build
.
Started working on the root initialization and method modulation. The original
false_roots
was adict
object, should Icdef dict self.initial_roots
or justobject
?
I don't think it matters. You only use false_roots
inside Gil code, so it's basically all Python.
My current attempt begins by updating
self.splitter
with the newX
andy
, meanwhile storing the leaves inself.initial_roots
for builders to use inbuild
.
Yeah that sounds right I think? Cuz the builder just needs to keep track of where you are in the tree. The splitter just splits the data that it's given and pumps out a new node. The builder then should know where to put the node and make sure the tree is connected well.
Perhaps start with writing the initialize_node_queue
function and also write a unit-test for it. Will it be cpdef, or cdef? You can write a cpdef
wrapper so you can call it in pytest if you plan on making it a cdef
function.
By the way, I would also start from submodulev2
branch instead, so you don't break anything by accident.
By the way, I would also start from
submodulev2
branch instead, so you don't break anything by accident.
Actually, I screwed up that branch currently cuz I was trying to incorporate the latest API changes in scikit-learn 1.4dev0 (i.e. monotonicity constraints in trees), but it didn't work. I would start a new branch off of v1.3
, which contains the working sklearn submodule we use in scikit-tree.
You can name it v1.4
, so it's slated to get added whenever we upgrade to use sklearn v1.4 in our submodule.
Updated version in #50
Reference Issues/PRs
https://github.com/neurodata/scikit-tree/issues/40
What does this implement/fix? Explain your changes.
DecisionTreeClassifier
to incrementally update the tree structure with new data batchesAny other comments?
@adam2392 This is a branch copy I made so feel free to modify it.