LukeMathWalker commented 4 years ago

In terms of functionality, the mid-term end goal is to achieve an offering of ML algorithms and pre-processing routines comparable to what is currently available in Python's scikit-learn.

These algorithms can either be:

re-implemented in Rust;
re-exported from an existing Rust crate, if available on crates.io with a compatible interface.

In no particular order, focusing on the main gaps:

Clustering:
- [x] DBSCAN
- [x] Spectral clustering;
- [x] Hierarchical clustering;
- [x] OPTICS.
Preprocessing:
- [x] PCA
- [x] ICA
- [x] Normalisation
- [x] CountVectoriser
- [x] TFIDF
- [x] t-SNE
Supervised Learning:
- [x] Linear regression;
- [x] Ridge regression;
- [x] LASSO;
- [x] ElasticNet;
- [x] Support vector machines;
- [x] Nearest Neighbours;
- [ ] Gaussian processes; (integrating friedrich - tracking issue https://github.com/nestordemeure/friedrich/issues/1)
- [x] Decision trees;
- [ ] Random Forest
- [x] Naive Bayes
- [x] Logistic Regression
- [ ] Ensemble Learning
- [ ] Least Angle Regression
- [x] PLS

The collection is on purpose loose and non-exhaustive, it will evolve over time - if there is an ML algorithm that you find yourself using often on a day to day, please feel free to contribute it :100:

bytesnake commented 3 years ago

Nearest neighbours merged in #120, thanks to @YuhanLiin

mrleu commented 3 years ago

hi all i'd like to help implement too. what's the best way to pick up a task?

bytesnake commented 3 years ago

hi all i'd like to help implement too. what's the best way to pick up a task?

not difficult, just mention your interest here and I will add you to the list once you've submitted the initial draft :)

Clara322 commented 2 years ago

Is there any interest for linfa supporting model selection algorithms such as grid search or hyperparameter tuning?

xd009642 commented 2 years ago

@Clara322 I personally think that would be a good candidate for a new linfa crate if you want to open an issue for it specifically then there can be some discussion on the specifics of what the design will look like and the steps to implement it :+1:

vaijira commented 2 years ago

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

erkasc01 commented 2 years ago

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

bytesnake commented 2 years ago

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

there are many interesting pattern which linfa can learn from but we would need first to support graphical models

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

cool, sure! Once you have a working prototype, submit a PR and I will review the integration. We have to see how to add support for incomplete datasets though

vaijira commented 2 years ago

@bytesnake I'm playing with it, creating graph and identification support. If one day i feel it can be ready i'll submit a PR. https://github.com/vaijira/linfa/tree/causal/algorithms/linfa-causal-inference

YuhanLiin commented 2 years ago

Infrastructure Goals

Aside from just adding new algorithms, there are also some infrastructure tasks that will significantly improve the ergonomics and performance of Linfa. They are listed here in descending order of importance, in my opinion:

[ ] #220: Python bindings allows linfa to target the same userbase as scikit-learn, broadening the reach of the project. Having Python benchmarks also allows a fair performance comparison between linfa and scikit-learn.
[ ] #228: This task potentially allows removing BLAS dependencies from linfa completely, significantly increasing code quality. It also has the side-effect of increasing benchmark coverage.
[ ] #103: More benchmarks allow us to make performance optimizations with confidence.
[ ] #161: Allows users to have more visibility into the internals of longer-running algorithms, similar to other mainstream ML libraries.

oojo12 commented 1 year ago

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

bernardo-sb commented 1 year ago

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

YuhanLiin commented 1 year ago

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

YuhanLiin commented 1 year ago

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

Random forests are covered by this PR.

Categorical encoding would go into linfa-preprocessing. I'm pretty sure we don't have it but just check to make sure.

MAPE is a simple function that would go into linfa/src/metrics_regression.rs

oojo12 commented 1 year ago

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

It can perform dimensionality-reduction (transform). It can also just be used to predict classes (predict). The parentheses hold the method analog in Sklearn. Is there a preference for which should be implemented? Also, I am still getting familiar with Rust so it may take a few weeks to get done.

YuhanLiin commented 1 year ago

Preferably implement both if possible.

oojo12 commented 1 year ago

Gotcha

On Sun, Oct 23, 2022, 11:32 PM Yuhan Lin @.***> wrote:

Preferably implement both if possible.

— Reply to this email directly, view it on GitHub https://github.com/rust-ml/linfa/issues/7#issuecomment-1288366931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHYMCWVD5TXQ3UTMGHVNSTWEX7LPANCNFSM4JTOIM2Q . You are receiving this because you commented.Message ID: @.***>

LundinMachine commented 1 year ago

Are there plans to implement ridge regression in the linear sub-package? Looking for models to contribute.

YuhanLiin commented 1 year ago

Ridge regression should already be in linfa-elasticnet

LundinMachine commented 1 year ago

What about imputation, similar to scikit imput?

YuhanLiin commented 1 year ago

We don't have that. That can go in linfa-preprocessing

HridayM25 commented 1 year ago

Hi! Can I take up Random Forests? Also can we look to implement xgboost and adaboost?

YuhanLiin commented 1 year ago

229 implements bootstrap aggregation, which is a generalization of random forests, so you could work on that.

xgboost and adaboost seem to both be ensemble algorithms that are not necessarily tied to decision trees (correct me if I'm wrong), so we should probably put them in a new algorithm crate called linfa-emsemble or something. Bootstrap aggregation should probably go in there as well.

sebasv commented 1 year ago

I'd like to contribute quantile regression

MarekJReid commented 9 months ago

In terms of functionality, the mid-term end goal is to achieve an offering of ML algorithms and pre-processing routines comparable to what is currently available in Python's scikit-learn.

These algorithms can either be:
* re-implemented in Rust;

* re-exported from an existing Rust crate, if available on [crates.io](crates.io) with a compatible interface.
In no particular order, focusing on the main gaps:
* Clustering:

  * [x]  DBSCAN
  * [x]  Spectral clustering;
  * [x]  Hierarchical clustering;
  * [x]  OPTICS.

* Preprocessing:

  * [x]  PCA
  * [x]  ICA
  * [x]  Normalisation
  * [x]  CountVectoriser
  * [x]  TFIDF
  * [x]  t-SNE

* Supervised Learning:

  * [x]  Linear regression;
  * [x]  Ridge regression;
  * [x]  LASSO;
  * [x]  ElasticNet;
  * [x]  Support vector machines;
  * [x]  Nearest Neighbours;
  * [ ]  Gaussian processes; (integrating `friedrich` - tracking issue [Integrating friedrich into linfa nestordemeure/friedrich#1](https://github.com/nestordemeure/friedrich/issues/1))
  * [x]  Decision trees;
  * [ ]  Random Forest
  * [x]  Naive Bayes
  * [x]  Logistic Regression
  * [ ]  Ensemble Learning
  * [ ]  Least Angle Regression
  * [x]  PLS
The collection is on purpose loose and non-exhaustive, it will evolve over time - if there is an ML algorithm that you find yourself using often on a day to day, please feel free to contribute it 💯

Id love to take on Random Forest! I have previously implemented it simplistically in Go, but I'd love to make it happen in Rust. This is my first open source contribution - let me know how I can make it happen :)

giorgiozoppi commented 7 months ago

I'd also would like to help this.

AndersonYin commented 7 months ago

I'm interested in the least angle regression (lars). It seems that PR #115 was trying to implement it but it has paused for 3 years. So I guess it's basically abolished. I'm going to pick it up.

giorgiozoppi commented 7 months ago

I am interested in random forests.

zenconnor commented 7 months ago

@MarekJReid @giorgiozoppi did either of you take a chance at random forests?

giorgiozoppi commented 7 months ago

i look into. At school we did this week. For python binding maturin is perfect. @zenconnor should i look inside scitkit-learn? I was looking at scikit learn implementation, as soon I can i provide a class diagram of that.

duskmoon314 commented 4 months ago

Hello. I want to ask if you accept algorithms that are out of the box with scikit-learn. For example, scikit-multiflow provides machine learning algorithms for streaming data, not covered by scikit-learn.

I want to use Rust to achieve machine learning on streaming data. I haven't found crates that suit my needs, so I may implement several algorithms. I wonder if I could integrate them into Linfa in the future.

rust-ml / linfa

Roadmap #7

Infrastructure Goals

229 implements bootstrap aggregation, which is a generalization of random forests, so you could work on that.