mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
163 stars 54 forks source link

343 streamline perf #344

Closed evaham1 closed 1 week ago

evaham1 commented 1 week ago

perf.assess.mixo.plsda() and perf.assess.mixo.splsda()

Have created perf.assess.mixo.plsda() and exported to identical perf.assess.mixo.splsda(). These functions are essentially stripped down versions of perf.mixo.plsda()/per.mixo.splsda() in which instead of looping across components 1 to ncomp they only run performance assessment for ncomp.

Unit tests: were created to ensure that the new perf.assess() functions give exactly the same result as perf() for the same data, context and ncomp. To ensure this, set.seed() had to be added inside the component for loop in the perf() function. Unit tests also check for running in series and in parallel.

Plotting: no plots are made for perf.assess() if validation = loo or if validation = Mfold and nrep = 1. Otherwise a plot is made with just one point on the x axis as ncomp eg Screenshot 2024-11-13 at 2 22 46 pm Need to decide whether this is an informative plot and if want to make sure it works without repeats or if this doesn't matter and to just make a note that such plots can only be generated if repeated CV is done

evaham1 commented 1 week ago

perf.assess.mixo.pls() and perf.assess.mixo.spls()

Have created perf.assess.mixo.pls() and exported to identical perf.assess.mixo.spls(). These functions are essentially stripped down versions of perf.mixo.pls()/per.mixo.spls(). Ideally to improve runtime I would have removed any looping over component values from 1:ncomp however after testing this gave different results for the final error metrics. This is particularly the case for Q2 which is calculated using RSS from ncomp-1, but after testing also appeared to be the case for other error metrics although the source of the component dependency is not clear (could also be due to seed setting within loop, again something I played around with but couldn't fix in a non-loop manner). As the function is quite intricate I decided to leave the loops as is despite inflated runtime and simply subset results to keep only error metrics relating to the extract ncomp.

Unit tests: were created to ensure that the new perf.assess() functions give exactly the same result as perf() for the same data, context and ncomp. Added additional testing for different modes for pls and spls, and checked also feature stability ouputs for perf.spls() and perf.assess.spls().

Plotting: plotting of perf.assess() is similar to perf() for PLS objects even when nrep = 1. nrep = 10 Screenshot 2024-11-15 at 3 08 38 pm nrep = 1 Screenshot 2024-11-15 at 3 09 44 pm valdation = loo Screenshot 2024-11-15 at 3 10 17 pm

evaham1 commented 1 week ago

perf.assess.sgccda()

Have created perf.assess.sgccda() built on perf.sgccda(). Ideally to improve runtime I would have removed calculations over multiple components (in this function this is done using lots of lapplys) but due to complexity and possible inter-dependency between components maintained all the code in the function and just added lines at the end to filter to retain only information for the component used in the input model whilst retaining the result output data structure.

Unit tests: were created to ensure that the new perf.assess.sgccda() give exactly the same result as perf() for the same data, context and ncomp.

Plotting: plotting for block-plsda and block-splda objects run with perf.assess() only work if nrep>1 and validation = 'Mfold' Screenshot 2024-11-18 at 4 05 12 pm

evaham1 commented 1 week ago

perf.assess.mint.plsda() and perf.assess.mint.splsda()

Have created perf.assess.mint(s)plsda() built on perf.mint.plsda(). Fixed for loops so only calculates metrics for the one component corresponding to ncomp, although for auc extra data slots are still generated.

Unit tests: were created to ensure that the new perf.assess.mint.plsda() give exactly the same result as perf() for the same data, context and ncomp.

Plotting: plotting does not work for perf.assess.mint.plsda() likely due to lack of repeats in LOO CV method consistent with which configurations allow for plotting in the other perf.assess functions

evaham1 commented 1 week ago

After discussing with KA makes sense to remove any plotting functionality for these objects as the plots are not informative if they are not comparing anything. Instead simply keep the performance metrics for the model in question which can be used just as a simple readout for performance of the final model.

Also did a couple more checks to make sure the PA metrics that are outputted for pls.assess() are identical to those outputted by pls().