Open editorialbot opened 1 month ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.90 T=0.02 s (1758.7 files/s, 126099.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 15 280 632 648
XML 1 0 0 467
Markdown 7 99 0 283
reStructuredText 12 75 73 107
YAML 3 9 21 86
TeX 1 5 0 58
TOML 1 4 0 21
-------------------------------------------------------------------------------
SUM: 40 472 726 1670
-------------------------------------------------------------------------------
Commit count by author:
106 Tadej Lahovnik
1 karakatic
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1109/TSMCC.2011.2157494 is OK
- 10.1016/j.swevo.2021.101006 is OK
- 10.1007/978-3-540-46239-2_18 is OK
- 10.1007/BFb0029742 is OK
- 10.1145/3205651.3205774 is OK
- 10.1007/978-3-030-72699-7_48 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Paper file info:
š Wordcount for paper.md
is 1384
ā
The paper includes a Statement of need
section
License info:
ā
License found: MIT License
(Valid open source OSI approved license)
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@kellyrowland @lahovniktadej Here are my comments: For the package:
iris
data is good but not enough. I would be interested in applying this package on datasets with (1) larger sample size, and (2) different outcomes/labels. For example, I would be interested in the relationship between socioeconomic factors and price of housing using some housing-price datasets. The price of housing is a continuous outcome. Decision trees are designed for binary/categorical labels but there should exist modifications for this method to be applied on continuous outcome? For the paper:
Statement of need
. Do you mean that you are the first group to develop software to incorporate evolutionary algorithms in genetics into decision trees?random_seed
may lead to different performances on different computing platforms, but based on my impression many methods can get a 100% test accuracy for iris
data. Do you have idea on this?In summary it looks interesting to me even if I am not in this field, so I would like to learn more about this package, which is also the motivation for all of my comments. Feel free to correct if factual errors exist. Thanks!
thanks for the review @WeakCha - @lahovniktadej please let me know if you have any questions or concerns.
@FlyingPumba š checking in on review status, thanks for getting started.
Hi @kellyrowland, thanks for the reminder. If it's ok, I'll look at it next week. I'm swamped with work right now.
@WeakCha Thanks for taking the time to review our paper and repository.
For the package:
pyproject.toml
(located in the repository's root directory) and in the documentation. We have since updated the contribution guide with a list of dependencies and provided a link to the dependency list in README.md
.README.md
with extended descriptions of the individual functions as well as the descriptions in the library documentation.README.md
has been updated. The "Community Guidelines" section contains contact information and a link to the contribution guide.GATree
can handle datasets with larger sample sizes and different outcomes/labels. However, it is limited to classification tasks at the moment. We plan to introduce support for regression tasks and custom operators after the publication, but until then, our primary focus is on the classification task.For the paper:
GATree
still presents a valuable contribution, not yet filled with any other library.GATree
can also achieve 100% test accuracy for the iris
data. We have updated the paper and included one of the seeds which produced such a result. To ensure that the perfect result on iris
is achieved with almost all seeds, one needs to raise the population size or the generation limit (or both). For demonstration, however, these limits were left at a lower level to enable quick library testing.GATree.plot()
method. Figure 3 was created using an external tool based on the output of this method. Additionally, we have provided a simple example of usage (examples/plot_decision_tree.py
). We have also updated the iris
example (examples/iris.py
), which now contains the code for the graphs shown in Figure 2.@lahovniktadej Thanks for your long response! @kellyrowland Here are my follow-up responses.
For the package:
README
(for example, I would put short description of "mutation", but remove technical details). Personally, I would not feed the audience too much, but open to suggestions.
4.I would still insist on adding at least one example, applying your algorithm on a larger dataset (N > 1,000, for example). It is fine temporarily not to apply your algorithm on a dataset with a different outcome (but still necessary in your future development), I suspect that iris
might not be good enough to reveal the capacity of your algorithm, as many algorithms can achieve a perfect result in the test set easily. It is fine not beating the best results, a comparative performance is fine, as your novelty is a new algorithm that currently does not exist. For the paper:
Hi @lahovniktadej @WeakCha. Here are my comments:
For the package:
fit
and predict
methods).For the paper:
Changes needed:
Minor comments:
@WeakCha Thanks for the follow-up response.
For the package:
README
(removed the technical details) and kept the extended descriptions in the documentation.iris
dataset to include additional, more substantial datasets (examples/adult.py
, examples/make_classification.py
). The adult
(also known as Census Income
) dataset contains 48.842 instances, while the make_classification
script generates a synthetic dataset with 1.500 instances. These additional examples will demonstrate the robustness and generalisability of our algorithm.For the paper:
adult
dataset, where we assess the performance of GATree
across 100 independent runs to account for variability and ensure the robustness of our results. For each run, we will be recording both accuracy and F1 score. Once the experiment is complete, we will update the paper and include a table summarising these results.@FlyingPumba Thanks for taking the time to review our paper and repository.
Changes needed:
adult
dataset, where we assess the performance of GATree
across 100 independent runs to account for variability and ensure the robustness of our results. For each run, we will be recording both accuracy and F1 score. Once the experiment is complete, we will update the paper and include a table summarising these results.GATree
's limitation to classification tasks in the paper and the package's README
.Minor comments:
README
and the documentation to include Python 3.11 and 3.12.README
and the paper to state this explicitly.Hi @lahovniktadej. Thanks for making the changes. The experiment that you are executing sounds very good, since it will show the robustness of GATree
when running on bigger problems. Please make sure that you also add a comparison against other approaches for building decision trees, e.g., compare GATree
performance on adult
to sci-kit learn default implementation, gplearn, tinyGP, TensorGP, etc.
@WeakCha @FlyingPumba We have updated the paper to include the results of our experiment and comparison against other approaches for building decision trees.
@FlyingPumba We appreciate your recommendations for comparison against other approaches, such as tinyGP and TensorGP. However, after careful consideration, we believe that a direct comparison between GATree
and tinygp/TensorGP may not be appropriate for the following reasons:
GATree
. Therefore, comparing tinygp with GATree
would not provide meaningful insights into the performance of decision tree evolution using genetic algorithms.GATree
, specialised for evolutionary decision tree construction, less relevant and potentially misleading.@kellyrowland @lahovniktadej Thanks for the follow-up! Could you please regenerate the paper PDF? I have trouble finding the paper...Thanks!
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@lahovniktadej Thanks a lot!
For the paper:
tinygp
and TensorGP
support the decision tree training and prediction, they are qualified to be competing methods, even if their design purposes/principles are fundamentally different with genetic algorithms. In other words, it is reasonable not to compare your algorithm with tinygp
and TensorGP
if they do not provide interfaces for decision tree prediction. I am not familiar with these two so I will leave this question to you and also the another reviewer @FlyingPumba .Hi @lahovniktadej, thanks for the update. I agree with @WeakCha on points 1 to 3. As for point 4, I think it's OK not to compare against tinygp
and TensorGP
if these libraries do not build decision trees. The new experiment comparing against scikit-learn
and gplearn
is already a great improvement. One question that I have for that new part of the paper: I see that the DecisionTreeClassifier
of scikit-learn
is limited to max depth of 5, are you also limiting the depth of the decision trees built by GATree
or gplearn
? If not, this could be a threat to validity to the results shown.
Submitting author: !--author-handle-->@lahovniktadej<!--end-author-handle-- (Tadej Lahovnik) Repository: https://github.com/lahovniktadej/gatree Branch with paper.md (empty if default branch): Version: 0.1.4 Editor: !--editor-->@kellyrowland<!--end-editor-- Reviewers: @FlyingPumba, @WeakCha Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@FlyingPumba & @WeakCha, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kellyrowland know.
āØ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest āØ
Checklists
š Checklist for @WeakCha
š Checklist for @FlyingPumba