openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: GATree: Evolutionary decision tree classifier in Python #6748

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@lahovniktadej<!--end-author-handle-- (Tadej Lahovnik) Repository: https://github.com/lahovniktadej/gatree Branch with paper.md (empty if default branch): Version: 0.1.4 Editor: !--editor-->@kellyrowland<!--end-editor-- Reviewers: @FlyingPumba, @WeakCha Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8"><img src="https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8/status.svg)](https://joss.theoj.org/papers/b3deeed0f75fedff6e284569bdd97ef8)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@FlyingPumba & @WeakCha, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kellyrowland know.

āœØ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest āœØ

Checklists

šŸ“ Checklist for @WeakCha

šŸ“ Checklist for @FlyingPumba

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1758.7 files/s, 126099.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          15            280            632            648
XML                              1              0              0            467
Markdown                         7             99              0            283
reStructuredText                12             75             73            107
YAML                             3              9             21             86
TeX                              1              5              0             58
TOML                             1              4              0             21
-------------------------------------------------------------------------------
SUM:                            40            472            726           1670
-------------------------------------------------------------------------------

Commit count by author:

   106  Tadej Lahovnik
     1  karakatic
editorialbot commented 1 month ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/TSMCC.2011.2157494 is OK
- 10.1016/j.swevo.2021.101006 is OK
- 10.1007/978-3-540-46239-2_18 is OK
- 10.1007/BFb0029742 is OK
- 10.1145/3205651.3205774 is OK
- 10.1007/978-3-030-72699-7_48 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 month ago

Paper file info:

šŸ“„ Wordcount for paper.md is 1384

āœ… The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

āœ… License found: MIT License (Valid open source OSI approved license)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 1 month ago

Review checklist for @WeakCha

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

FlyingPumba commented 1 month ago

Review checklist for @FlyingPumba

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

WeakCha commented 1 month ago

@kellyrowland @lahovniktadej Here are my comments: For the package:

  1. Although installation is successful. A list of dependencies is missing in the repo.
  2. The meanings of the functions (mutation, crossover, selection, etc.) are unclear to me and there is no description in the documentation. Although not affecting usage and not needed to show in the paper, it will be helpful to explain a little bit on the meanings of them and provide simple examples in the documentation.
  3. The contact info for third parties to join is missing.
  4. The examples are too simple. iris data is good but not enough. I would be interested in applying this package on datasets with (1) larger sample size, and (2) different outcomes/labels. For example, I would be interested in the relationship between socioeconomic factors and price of housing using some housing-price datasets. The price of housing is a continuous outcome. Decision trees are designed for binary/categorical labels but there should exist modifications for this method to be applied on continuous outcome?
  5. Comparisons with other methods are missing. But I think if the advantages of this method are clear, this is optional.

For the paper:

  1. Just a confirmation about the advantages in your Statement of need. Do you mean that you are the first group to develop software to incorporate evolutionary algorithms in genetics into decision trees?
  2. 96.67% is not the result for the 1st example shown in the paper (no customized loss). In my computer it is 93.33%. I agree that same random_seed may lead to different performances on different computing platforms, but based on my impression many methods can get a 100% test accuracy for iris data. Do you have idea on this?
  3. What is the meaning of average fitness/best fitness? The average loss value/minimum loss value?
  4. I love the visualizations but I did not find the functionalities of visualization in your package. I would like to add more visualization functions so that users can identify better what is going on in your algorithm.

In summary it looks interesting to me even if I am not in this field, so I would like to learn more about this package, which is also the motivation for all of my comments. Feel free to correct if factual errors exist. Thanks!

kellyrowland commented 1 month ago

thanks for the review @WeakCha - @lahovniktadej please let me know if you have any questions or concerns.

@FlyingPumba šŸ‘‹ checking in on review status, thanks for getting started.

FlyingPumba commented 1 month ago

Hi @kellyrowland, thanks for the reminder. If it's ok, I'll look at it next week. I'm swamped with work right now.

lahovniktadej commented 1 month ago

@WeakCha Thanks for taking the time to review our paper and repository.

For the package:

  1. The list of dependencies is available in pyproject.toml (located in the repository's root directory) and in the documentation. We have since updated the contribution guide with a list of dependencies and provided a link to the dependency list in README.md.
  2. Thank you for the excellent suggestion. We have taken the understanding of the behaviour of these functions for granted, which is why the descriptions of how they work are rather vague. Thanks to the suggestion, we have updated README.md with extended descriptions of the individual functions as well as the descriptions in the library documentation.
  3. README.md has been updated. The "Community Guidelines" section contains contact information and a link to the contribution guide.
  4. GATree can handle datasets with larger sample sizes and different outcomes/labels. However, it is limited to classification tasks at the moment. We plan to introduce support for regression tasks and custom operators after the publication, but until then, our primary focus is on the classification task.

For the paper:

  1. Thank you for the question. As mentioned in the paper, this is not the first package to introduce genetic algorithms for tree structures to Python, but this is the first package to do this with DECISION trees for classification tasks. As mentioned, existing Python libraries use genetic algorithms to induce tree PROGRAMS (sometimes referred to as genetic programming), which can but, as the literature shows, do not achieve comparative results. The difference between the decision trees and tree programs is that in decision trees, the nodes consist of rules (i.e., age>25), but in tree programs, we get formulas/programs, which are represented in the structure of the tree (i.e., age+7*wage/2-height). Still, the state-of-the-art of traditional machine learning on tabular data (not on images, text, or other unstructured data) is done with decision trees (i.e., CART, Random Forest, Gradient Boosting, AdaBoost, Histogram Boosting). It is possible to customise existing libraries to enable decision tree usage instead of tree programs, but this is neither easy nor optimised for this usage. Thus, we believe our GATree still presents a valuable contribution, not yet filled with any other library.
  2. Of course, GATree can also achieve 100% test accuracy for the iris data. We have updated the paper and included one of the seeds which produced such a result. To ensure that the perfect result on iris is achieved with almost all seeds, one needs to raise the population size or the generation limit (or both). For demonstration, however, these limits were left at a lower level to enable quick library testing.
  3. Fitness is the estimation of the quality of the individual decision trees, which determines whether a decision tree survives into the next generation or not. In the current implementation, this is calculated as the combination of accuracy on the test set (preferring better/higher accuracy) and the tree size (preferring smaller, more generalisable trees). The average fitness is the actual average value of all the fitness values of the entire population. The best fitness is only the one fitness value - the one from the best individual in the population.
  4. Indeed, the visualisations are not part of our package. The structure of the final decision tree (best individual among all populations) can be displayed using the GATree.plot() method. Figure 3 was created using an external tool based on the output of this method. Additionally, we have provided a simple example of usage (examples/plot_decision_tree.py). We have also updated the iris example (examples/iris.py), which now contains the code for the graphs shown in Figure 2.
WeakCha commented 1 month ago

@lahovniktadej Thanks for your long response! @kellyrowland Here are my follow-up responses.

For the package:

  1. This looks very nice and pretty and I really love it! It would be a little better if preserving all the tutorial in your documentation, but only putting the most related info in the README (for example, I would put short description of "mutation", but remove technical details). Personally, I would not feed the audience too much, but open to suggestions. 4.I would still insist on adding at least one example, applying your algorithm on a larger dataset (N > 1,000, for example). It is fine temporarily not to apply your algorithm on a dataset with a different outcome (but still necessary in your future development), I suspect that iris might not be good enough to reveal the capacity of your algorithm, as many algorithms can achieve a perfect result in the test set easily. It is fine not beating the best results, a comparative performance is fine, as your novelty is a new algorithm that currently does not exist.

For the paper:

  1. I agree with this, but could you please add a table/figure in your paper showing the performance of your algorithms with different sets of parameters? Besides, to avoid any coincidance I would repeat the experiment at least 100 times (you do not need to show this in your package) and report the mean and standard deviation of the metrics, but open to your thoughts. 3.This is very clear and it would be great to put these explanations in your paper.
FlyingPumba commented 1 month ago

Hi @lahovniktadej @WeakCha. Here are my comments:

For the package:

For the paper:

Changes needed:

Minor comments:

lahovniktadej commented 1 month ago

@WeakCha Thanks for the follow-up response.

For the package:

  1. This is a great suggestion. We have streamlined the descriptions in the README (removed the technical details) and kept the extended descriptions in the documentation.
  2. We have extended our experiments beyond the iris dataset to include additional, more substantial datasets (examples/adult.py, examples/make_classification.py). The adult (also known as Census Income) dataset contains 48.842 instances, while the make_classification script generates a synthetic dataset with 1.500 instances. These additional examples will demonstrate the robustness and generalisability of our algorithm.

For the paper:

  1. Thanks for the suggestion. We are currently executing an experiment to evaluate the performance of our algorithm under different parameter settings. This experiment is being conducted on the adult dataset, where we assess the performance of GATree across 100 independent runs to account for variability and ensure the robustness of our results. For each run, we will be recording both accuracy and F1 score. Once the experiment is complete, we will update the paper and include a table summarising these results.
  2. We have updated the paper and included explanations of the values shown in Figure 2.
lahovniktadej commented 1 month ago

@FlyingPumba Thanks for taking the time to review our paper and repository.

Changes needed:

Minor comments:

FlyingPumba commented 1 month ago

Hi @lahovniktadej. Thanks for making the changes. The experiment that you are executing sounds very good, since it will show the robustness of GATree when running on bigger problems. Please make sure that you also add a comparison against other approaches for building decision trees, e.g., compare GATree performance on adult to sci-kit learn default implementation, gplearn, tinyGP, TensorGP, etc.

lahovniktadej commented 1 week ago

@WeakCha @FlyingPumba We have updated the paper to include the results of our experiment and comparison against other approaches for building decision trees.

@FlyingPumba We appreciate your recommendations for comparison against other approaches, such as tinyGP and TensorGP. However, after careful consideration, we believe that a direct comparison between GATree and tinygp/TensorGP may not be appropriate for the following reasons:

WeakCha commented 1 week ago

@kellyrowland @lahovniktadej Thanks for the follow-up! Could you please regenerate the paper PDF? I have trouble finding the paper...Thanks!

lahovniktadej commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 1 week ago

@lahovniktadej Thanks a lot!

For the paper:

  1. Could you please provide some descriptions of the adult dataset (sample size, features, outcome type, etc.)?
  2. Could you please explain why the 2 comparison methods have a much smaller F1 score? It would be great if you could provide the code for running this experiment so that we could reproduce the results.
  3. For organization purposes, it would be great to add section titles to denote which part belongs to your illustration of the algorithm, and which part belongs to your experiments and comparison, and so on.
  4. In my opinion, as long as tinygp and TensorGP support the decision tree training and prediction, they are qualified to be competing methods, even if their design purposes/principles are fundamentally different with genetic algorithms. In other words, it is reasonable not to compare your algorithm with tinygp and TensorGP if they do not provide interfaces for decision tree prediction. I am not familiar with these two so I will leave this question to you and also the another reviewer @FlyingPumba .
FlyingPumba commented 1 week ago

Hi @lahovniktadej, thanks for the update. I agree with @WeakCha on points 1 to 3. As for point 4, I think it's OK not to compare against tinygp and TensorGP if these libraries do not build decision trees. The new experiment comparing against scikit-learn and gplearn is already a great improvement. One question that I have for that new part of the paper: I see that the DecisionTreeClassifier of scikit-learn is limited to max depth of 5, are you also limiting the depth of the decision trees built by GATree or gplearn? If not, this could be a threat to validity to the results shown.