rprops / Phenoflow_package

R package offering functionality for the advanced analysis of microbial flow cytometry data
GNU General Public License v2.0
9 stars 5 forks source link

Allow group_label in RandomF_FCS() #49

Open rprops opened 6 years ago

rprops commented 6 years ago

This would allow creating multiple random forest models for each "group". For example if you have three groups of strains that you would like to distinguish on a strain by strain basis per group. This should be straightforward to implement by combined apply(), and return results in list().

prubbens commented 6 years ago

Not sure if this is necessary, as Random Forests intrinsically are able to deal with more than two classes. You could just output the confusion matrix.

These approaches are interesting for classifiers which are only able to deal with a binary output, such as Support Vector Machines.

rprops commented 6 years ago

In the case of training a single classifier won't you have a problem in the following scenario:

  1. Experiment: 3 cocultures of 2 strains

  2. Train classifier on all 6 strains and make predictions on each coculture.
    Problem: % of cells will be classified into wrong strain in case of overlapping strain FCM data.

To me, it seems more fruitful to train a classifier based on the data presented in each individual experiment?

prubbens commented 6 years ago

This would I think make more sense indeed. However, this is maybe already quite a specific experimental setup? If we create a general RandomForestClassifier object, I don't think a user would have to do much effort to tailor it towards these kind of experiments.

Maybe it's an idea to try and list some general experimental synthetic setups for which functions are being created?

FMKerckhof commented 6 years ago

For my setup this would be very useful. Typically the first step is a pairwise screening either how in 96-well formats. Being able to pass on a "group" variable would be very useful.

rprops commented 5 years ago

Basically, contrast matrix is necessary, will adress next dev window