[REVIEW]: GATree: Evolutionary decision tree classifier in Python

openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

694 stars 36 forks source link

[REVIEW]: GATree: Evolutionary decision tree classifier in Python #6748

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@lahovniktadej@kellyrowland<!--end-editor-- Reviewers: @FlyingPumba, @WeakCha Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8"><img src="https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8/status.svg)](https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@FlyingPumba & @WeakCha, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kellyrowland know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @WeakCha

📝 Checklist for @FlyingPumba

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1758.7 files/s, 126099.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          15            280            632            648
XML                              1              0              0            467
Markdown                         7             99              0            283
reStructuredText                12             75             73            107
YAML                             3              9             21             86
TeX                              1              5              0             58
TOML                             1              4              0             21
-------------------------------------------------------------------------------
SUM:                            40            472            726           1670
-------------------------------------------------------------------------------

Commit count by author:

   106  Tadej Lahovnik
     1  karakatic

editorialbot commented 1 month ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/TSMCC.2011.2157494 is OK
- 10.1016/j.swevo.2021.101006 is OK
- 10.1007/978-3-540-46239-2_18 is OK
- 10.1007/BFb0029742 is OK
- 10.1145/3205651.3205774 is OK
- 10.1007/978-3-030-72699-7_48 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 1384

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 1 month ago

Review checklist for @WeakCha

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/lahovniktadej/gatree?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@lahovniktadej) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

FlyingPumba commented 1 month ago

Review checklist for @FlyingPumba

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/lahovniktadej/gatree?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@lahovniktadej) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

WeakCha commented 1 month ago

@kellyrowland @lahovniktadej Here are my comments: For the package:

Although installation is successful. A list of dependencies is missing in the repo.
The meanings of the functions (mutation, crossover, selection, etc.) are unclear to me and there is no description in the documentation. Although not affecting usage and not needed to show in the paper, it will be helpful to explain a little bit on the meanings of them and provide simple examples in the documentation.
The contact info for third parties to join is missing.
The examples are too simple. iris data is good but not enough. I would be interested in applying this package on datasets with (1) larger sample size, and (2) different outcomes/labels. For example, I would be interested in the relationship between socioeconomic factors and price of housing using some housing-price datasets. The price of housing is a continuous outcome. Decision trees are designed for binary/categorical labels but there should exist modifications for this method to be applied on continuous outcome?
Comparisons with other methods are missing. But I think if the advantages of this method are clear, this is optional.

For the paper:

Just a confirmation about the advantages in your Statement of need. Do you mean that you are the first group to develop software to incorporate evolutionary algorithms in genetics into decision trees?
96.67% is not the result for the 1st example shown in the paper (no customized loss). In my computer it is 93.33%. I agree that same random_seed may lead to different performances on different computing platforms, but based on my impression many methods can get a 100% test accuracy for iris data. Do you have idea on this?
What is the meaning of average fitness/best fitness? The average loss value/minimum loss value?
I love the visualizations but I did not find the functionalities of visualization in your package. I would like to add more visualization functions so that users can identify better what is going on in your algorithm.

In summary it looks interesting to me even if I am not in this field, so I would like to learn more about this package, which is also the motivation for all of my comments. Feel free to correct if factual errors exist. Thanks!

kellyrowland commented 1 month ago

thanks for the review @WeakCha - @lahovniktadej please let me know if you have any questions or concerns.

@FlyingPumba 👋 checking in on review status, thanks for getting started.

FlyingPumba commented 1 month ago

Hi @kellyrowland, thanks for the reminder. If it's ok, I'll look at it next week. I'm swamped with work right now.

lahovniktadej commented 1 month ago

@WeakCha Thanks for taking the time to review our paper and repository.

For the package:

The list of dependencies is available in pyproject.toml (located in the repository's root directory) and in the documentation. We have since updated the contribution guide with a list of dependencies and provided a link to the dependency list in README.md.
Thank you for the excellent suggestion. We have taken the understanding of the behaviour of these functions for granted, which is why the descriptions of how they work are rather vague. Thanks to the suggestion, we have updated README.md with extended descriptions of the individual functions as well as the descriptions in the library documentation.
README.md has been updated. The "Community Guidelines" section contains contact information and a link to the contribution guide.
GATree can handle datasets with larger sample sizes and different outcomes/labels. However, it is limited to classification tasks at the moment. We plan to introduce support for regression tasks and custom operators after the publication, but until then, our primary focus is on the classification task.

For the paper:

Thank you for the question. As mentioned in the paper, this is not the first package to introduce genetic algorithms for tree structures to Python, but this is the first package to do this with DECISION trees for classification tasks. As mentioned, existing Python libraries use genetic algorithms to induce tree PROGRAMS (sometimes referred to as genetic programming), which can but, as the literature shows, do not achieve comparative results. The difference between the decision trees and tree programs is that in decision trees, the nodes consist of rules (i.e., age>25), but in tree programs, we get formulas/programs, which are represented in the structure of the tree (i.e., age+7*wage/2-height). Still, the state-of-the-art of traditional machine learning on tabular data (not on images, text, or other unstructured data) is done with decision trees (i.e., CART, Random Forest, Gradient Boosting, AdaBoost, Histogram Boosting). It is possible to customise existing libraries to enable decision tree usage instead of tree programs, but this is neither easy nor optimised for this usage. Thus, we believe our GATree still presents a valuable contribution, not yet filled with any other library.
Of course, GATree can also achieve 100% test accuracy for the iris data. We have updated the paper and included one of the seeds which produced such a result. To ensure that the perfect result on iris is achieved with almost all seeds, one needs to raise the population size or the generation limit (or both). For demonstration, however, these limits were left at a lower level to enable quick library testing.
Fitness is the estimation of the quality of the individual decision trees, which determines whether a decision tree survives into the next generation or not. In the current implementation, this is calculated as the combination of accuracy on the test set (preferring better/higher accuracy) and the tree size (preferring smaller, more generalisable trees). The average fitness is the actual average value of all the fitness values of the entire population. The best fitness is only the one fitness value - the one from the best individual in the population.
Indeed, the visualisations are not part of our package. The structure of the final decision tree (best individual among all populations) can be displayed using the GATree.plot() method. Figure 3 was created using an external tool based on the output of this method. Additionally, we have provided a simple example of usage (examples/plot_decision_tree.py). We have also updated the iris example (examples/iris.py), which now contains the code for the graphs shown in Figure 2.

WeakCha commented 1 month ago

@lahovniktadej Thanks for your long response! @kellyrowland Here are my follow-up responses.

For the package:

This looks very nice and pretty and I really love it! It would be a little better if preserving all the tutorial in your documentation, but only putting the most related info in the README (for example, I would put short description of "mutation", but remove technical details). Personally, I would not feed the audience too much, but open to suggestions. 4.I would still insist on adding at least one example, applying your algorithm on a larger dataset (N > 1,000, for example). It is fine temporarily not to apply your algorithm on a dataset with a different outcome (but still necessary in your future development), I suspect that iris might not be good enough to reveal the capacity of your algorithm, as many algorithms can achieve a perfect result in the test set easily. It is fine not beating the best results, a comparative performance is fine, as your novelty is a new algorithm that currently does not exist.

For the paper:

I agree with this, but could you please add a table/figure in your paper showing the performance of your algorithms with different sets of parameters? Besides, to avoid any coincidance I would repeat the experiment at least 100 times (you do not need to show this in your package) and report the mean and standard deviation of the metrics, but open to your thoughts. 3.This is very clear and it would be great to put these explanations in your paper.

FlyingPumba commented 1 month ago

Hi @lahovniktadej @WeakCha. Here are my comments:

For the package:

The instructions for installation are very clear, as well as the guide for contributing. I was able to download, install and run the package.
I appreciate the extensive documentation to methods in the source code.
The usage example in README is clear and intuitive for anyone who has used sci-kit learn before (same API of fit and predict methods).

For the paper:

The paper is very well written and easy to follow.
Good overview and introductory example on the iris dataset.

Changes needed:

As @WeakCha mentioned, I think that the paper needs to include an example of running GATree on a more real dataset, and showing that it improves performance when compared against sci-kit default implementation and other approaches to building decision trees . I.e., the following claim in the paper needs evidence to support it: "[...]. GATree represents a significant advancement, offering a novel solution to the limitations of existing libraries. By integrating the principles of genetic algorithms with decision tree construction, GATree not only enhances adaptability and performance of these classifiers [...]".
The fact that GATree is limited to classification tasks should be added to the paper and package's README.

Minor comments:

README states that Python 3.9 or 3.10 is needed, but I ran the package using Python 3.11 and it seems to work.
There are many genetic/evolutionary algorithms in the literature. I believe this paper/package is implementing Standard GA. It would be good to state this somewhere.

lahovniktadej commented 1 month ago

@WeakCha Thanks for the follow-up response.

For the package:

This is a great suggestion. We have streamlined the descriptions in the README (removed the technical details) and kept the extended descriptions in the documentation.
We have extended our experiments beyond the iris dataset to include additional, more substantial datasets (examples/adult.py, examples/make_classification.py). The adult (also known as Census Income) dataset contains 48.842 instances, while the make_classification script generates a synthetic dataset with 1.500 instances. These additional examples will demonstrate the robustness and generalisability of our algorithm.

For the paper:

Thanks for the suggestion. We are currently executing an experiment to evaluate the performance of our algorithm under different parameter settings. This experiment is being conducted on the adult dataset, where we assess the performance of GATree across 100 independent runs to account for variability and ensure the robustness of our results. For each run, we will be recording both accuracy and F1 score. Once the experiment is complete, we will update the paper and include a table summarising these results.
We have updated the paper and included explanations of the values shown in Figure 2.

lahovniktadej commented 1 month ago

@FlyingPumba Thanks for taking the time to review our paper and repository.

Changes needed:

Thanks for the suggestion. We are currently executing an experiment to evaluate the performance of our algorithm under different parameter settings. This experiment is being conducted on the adult dataset, where we assess the performance of GATree across 100 independent runs to account for variability and ensure the robustness of our results. For each run, we will be recording both accuracy and F1 score. Once the experiment is complete, we will update the paper and include a table summarising these results.
We have included a note regarding GATree's limitation to classification tasks in the paper and the package's README.

Minor comments:

Thanks for the comment. This is an oversight on our part. We have since updated the README and the documentation to include Python 3.11 and 3.12.
Indeed, this paper/package implements standard genetic algorithms. Thank you for the excellent suggestion. We have updated the README and the paper to state this explicitly.

FlyingPumba commented 1 month ago

Hi @lahovniktadej. Thanks for making the changes. The experiment that you are executing sounds very good, since it will show the robustness of GATree when running on bigger problems. Please make sure that you also add a comparison against other approaches for building decision trees, e.g., compare GATree performance on adult to sci-kit learn default implementation, gplearn, tinyGP, TensorGP, etc.

lahovniktadej commented 1 week ago

@WeakCha @FlyingPumba We have updated the paper to include the results of our experiment and comparison against other approaches for building decision trees.

@FlyingPumba We appreciate your recommendations for comparison against other approaches, such as tinyGP and TensorGP. However, after careful consideration, we believe that a direct comparison between GATree and tinygp/TensorGP may not be appropriate for the following reasons:

tinygp: This library is specifically designed for building Gaussian Process (GP) models, not for genetic algorithms or evolutionary programming. Its focus is on Gaussian Processes, which fundamentally differ from the genetic algorithm-based approach used by GATree. Therefore, comparing tinygp with GATree would not provide meaningful insights into the performance of decision tree evolution using genetic algorithms.
TensorGP: While TensorGP is a general-purpose Genetic Programming engine, it is not tailored for building decision trees. Even though it could mimic classification decision trees with heavy overriding and customisation, its focus is on programs rather than trees. This makes a direct comparison with GATree, specialised for evolutionary decision tree construction, less relevant and potentially misleading.

WeakCha commented 1 week ago

@kellyrowland @lahovniktadej Thanks for the follow-up! Could you please regenerate the paper PDF? I have trouble finding the paper...Thanks!

lahovniktadej commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 1 week ago

@lahovniktadej Thanks a lot!

For the paper:

Could you please provide some descriptions of the adult dataset (sample size, features, outcome type, etc.)?
Could you please explain why the 2 comparison methods have a much smaller F1 score? It would be great if you could provide the code for running this experiment so that we could reproduce the results.
For organization purposes, it would be great to add section titles to denote which part belongs to your illustration of the algorithm, and which part belongs to your experiments and comparison, and so on.
In my opinion, as long as tinygp and TensorGP support the decision tree training and prediction, they are qualified to be competing methods, even if their design purposes/principles are fundamentally different with genetic algorithms. In other words, it is reasonable not to compare your algorithm with tinygp and TensorGP if they do not provide interfaces for decision tree prediction. I am not familiar with these two so I will leave this question to you and also the another reviewer @FlyingPumba .

FlyingPumba commented 1 week ago

Hi @lahovniktadej, thanks for the update. I agree with @WeakCha on points 1 to 3. As for point 4, I think it's OK not to compare against tinygp and TensorGP if these libraries do not build decision trees. The new experiment comparing against scikit-learn and gplearn is already a great improvement. One question that I have for that new part of the paper: I see that the DecisionTreeClassifier of scikit-learn is limited to max depth of 5, are you also limiting the depth of the decision trees built by GATree or gplearn? If not, this could be a threat to validity to the results shown.