Open pa-nathaniel opened 1 year ago
I haven't experienced this barrier, so far, but you're raising a valid and important point, of course.
So far, I've been viewing the competitive algorithms in the FFTrees package as a nice add-on with more benefits than costs. Given the lack of a generally accepted gold standard and the availability of a vast range of possible classification strategies, it's crucial to compare the performance of FFTs to some alternative models. The current range links and contrasts our seemingly naive trees with fancier methods typically associated with buzz-words like "statistical modeling" and "machine learning". And while I suspect that many users appreciate the automatic availability of such performance benchmarks, it's highly undesirable when enabling these benchmarks prevents them from installing and using FFTrees.
Hence, perhaps the key questions and trade-offs here are:
With regards to 3.: Beyond their technical demands, another critical issue with highly sophisticated alternative benchmarks is that our default usage often fails to exploit their full capacity. This is unavoidable and to be expected, as we're not even trying to optimize the performance of those algorithms. But when then finding a superior solution (e.g., by using RLR instead of LR), enthusiasts of those alternative algorithms (or skeptics of FFTs) may then construe our omission into a general argument against simpler strategies. Hence, removing non-optimized alternatives could also preempt accusations that our competition is not "fair" or "objective" (which may often be justified — but not out of bias or malice, but simply because we're devoting more attention and effort on our favored model than on its alternatives).
I take your points. I suspect most people who want to compare the effectiveness of FFTrees to other algorithms should be using packages built for that purpose (such as tidymodels
and parsnip
) rather than using the (somewhat hacky) solutions we built into this package.
I think it would be wise to
I'll create a PR for this but since it's a major change I won't merge until getting a review from @hneth
I am struggling to install FFTrees on a machine due to issues installing
randomForest
(due to some issues with a dependency with an M2 mac). Really frustrating and feels like a shame to have all of the great FFTrees functionality gated on being able to userandomForest
as a competitive algorithm.This gets me to wonder, what would the pros and cons be of reducing dependencies? Generally including non-essential dependencies is discouraged, and the more that I think about it,
randomForest
, and other packages used as competitive algorithms, are definitely not essential for seeing the benefits of FFTrees.How about removing
randomForest
, and maybee1071
(forsvm()
) as dependencies and just usingrpart::cart()
andlr
as competitive algorithms?I feel like 99.9% of users won't miss it and it could reduce the barrier to entry.
@hneth what do you think?