Closed vspinu closed 2 years ago
@vspinu thanks for the contribution. give me some time to review the changes
@vspinu thanks again for the additions and the corrections (regarding the classes). I'll be glad if you add also tests and update the vignette. It will be also nice to have the print, plot and summary methods. I'd rather prefer to keep the old predict functions than adding deprecation messages. I think keeping the old functions is also necessary for the users that already use these functions in their workflows or packages. Once this PR is completed please mention your necessary personal information (vorname, lastname, e-mail) so that I can add you as a contributor to the package DESCRIPTION file. thanks again.
I'll be glad if you add also tests and update the vignette. It will be also nice to have the print, plot and summary methods.
Ok. I will spend some time on this.
I'd rather prefer to keep the old predict functions than adding deprecation messages.
Of course. I didn't mean to deprecate those.
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs.
I've added a message so that the PR is not closed from the github-actions-bot till the changes are applied (tests, etc.)
Sorry for stalling this. Things started piling on my weekends suddenly, will try to resume this week.
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs.
I have re-added the comment, added tests and adjusted the vignette to use predict()
method. Also added print()
methods. The objects are now printed succinctly as follows:
> gmm
GMM Cluster
Call: GMM(data = dat, gaussian_comps = 3)
Data cols: 42
Centroids: 3
> km
KMeans Cluster
Call: KMeans_rcpp(data = dat, clusters = clust, num_init = 5, max_iters = 100, initializer = "optimal_init")
Data cols: 42
Centroids: 2
BSS/SS: 0.2800772
SS: 1312201 = 944683.6 (WSS) + 367517.6 (BSS)
> cm
Medoids Cluster
Call: Clara_Medoids(data = dat, clusters = 2, samples = 5, sample_size = 0.2, swap_phase = TRUE, fuzzy = T)
Data cols: 42
Medoids: 2
Cluster stats:
2 3
index 278 34
number_obs 226 174
max_dissimilarity 91.879999 114.495518
average_dissimilarity 36.482054 67.212241
isolation 1.396992 1.740851
You can modify those as you see fit.
One problem is Kmeans_arma
which returns only centroids instead of the complete structure as returned by KMeans_rcpp
. So the returned object is not classed and I have added a FIXME comment there.
Regarding the UI, for consistency and simplicity, I would propose to actually have only 3 top level functions: KMeans
, Medoids
and GMM
instead of the current (KMeans_rcpp
, Kmeans_arma
, Cluster_Medoids
, Clara_Medoids
and GMM
). KMeans
would have a method argument "rcpp"
or "arma"
, Medoids()
would have method
argument "pam"
or "clara"
. WDYT?
First of all thanks for all your implementations. Let me see the changes in the next days.
Regarding the UI, I think it's a good idea but we have to add deprecation messages for the previous functions so that the users can get accustomed to the new functions.
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs.
I've added a message so that this PR is not closed.
@vspinu,
can you also update the NEWS.md file by adding your changes under version 1.2.6?
Done! I have also bumped the dev version.
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs.
thank you for all the changes, I'll need a few days to do the following:
@vspinu, specific tests fail. are the failures related to your changes? It seems that the functions of the failed tests require a matrix.
Yes. It has to do with the fact that interally C++ is using positional references instead of names. So each time one modifies the fields of the result internal code would break.
I have fixed that by addressing with names. I had to rename a few internal slots for that. This also makes the internal names consistent with the user level output. I don't think you will object that, but please have a look at the last commit.
thank you @vspinu I'll have a look today in the afternoon
thank you very much @vspinu for all the changes. Give me a few days to update the version on CRAN. I'll add also your details in the DESCRIPTION file, is it ok by you?
person("Vitalie", "Spinu", role = "ctb")
Let me know. thanks.
I'll add also your details in the DESCRIPTION file, is it ok by you?
Sure. Thanks! I still think my contribution is a bit too slim for a contributor tag, but ok. I will try to get some more stuff in as I go with using the package.
@vspinu,
I intended today to submit the new version of the ClusterR package to CRAN and I ran an R CMD build
after I cloned the Github repository. However, on Linux I receive the following error:
> system("R CMD build ClusterR")
* checking for file ‘ClusterR/DESCRIPTION’ ... OK
* preparing ‘ClusterR’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘the_clusterR_package.Rmd’ using rmarkdown
Quitting from lines 189-198 (the_clusterR_package.Rmd)
Error: processing vignette 'the_clusterR_package.Rmd' failed with diagnostics:
no applicable method for 'predict' applied to an object of class "c('KMeansCluster', 'k-means clustering')"
--- failed re-building ‘the_clusterR_package.Rmd’
SUMMARY: processing the following file failed:
‘the_clusterR_package.Rmd’
Error: Vignette re-building failed.
Execution halted
Can you reproduce this error locally with the cloned repository? It seems that it's related to this pull request.
I explained in another PR on how I build and check an R package that includes Rcpp code. Therefore, if you want to reproduce the error can you follow these steps?
I am pretty sure everything is ok and it's some strange local issue. Those methods are registered in the NAMESPACE file. Make sure you don't introduce any diffs to NAMESPACE with your process.
R CMD build ClusterR
and checks do work ok for me. It also works on travis, right?
I have also triggered a winbuild (don't be surprised to see the email with a result).
thanks @vspinu, the winbuild looks ok (see the attached). the github action of this PR worked too. I'm wondering why I get this error. I'll have a look again and notify you.
you were right about the differences in the NAMESPACE file but I do not understand why do I get these diffs? I'm not experienced with S3 Methods. What I do is I make a clean build, check and install after I remove the .Rd files of the 'man' folder and create the documentation using devtools::document()
. Is there something wrong with this procedure?
For instance before the clean build, check I have
S3method(predict,GMMCluster)
S3method(predict,KMeansCluster)
S3method(predict,MedoidsCluster)
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
in the NAMESPACE and after I use devtools::document()
I have
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
do we have to use Collate in the DESCRIPTION file now that the ClusterR package makes use of S3 methods?
I also just use devtools::document
. Can it be an old version of devtools or roxygen2?
I use the latest versions of 'devtools' (2.4.2) and 'roxygen2' (7.1.2) on Linux with R version 4.1.1 (2021-08-10)
Which Operating System do you use? Is it possible that you can run the `devtools::document()' function on a Linux machine? It seems to me that it has to do with how we document the S3 methods (a similar SO issue)
When I run devtools::document()
I receive,
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
export(print.GMMCluster)
export(predict.GMMCluster)
export(predict.KMeansCluster)
export(predict.MedoidsCluster)
whereas currently in this repo we have,
S3method(predict,GMMCluster)
S3method(predict,KMeansCluster)
S3method(predict,MedoidsCluster)
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
The Run Rscript -e 'tic::script()'
that the Github Action of this repo uses just only 'builds' and 'checks' the package and does not create the documentation or updates the NAMESPACE.
I know that 'devtools::document' is not a requirement but if I'm not able to re-create the .Rd files then I might have issues in the future in case that I want to update any function of the package
When I run devtools::document() I receive,
But that's exactly what is needed. It fixes your original issue, right?
The export(print.GMMCluster)
is optional and is probably there because you have added #' @export print.GMMCluster
. I don't think it's necessary to export those functions, but it's surely up to you.
I am on linux.
But that's exactly what is needed. It fixes your original issue, right?
I'm still receiving the error,
> system("R CMD build ClusterR")
* checking for file ‘ClusterR/DESCRIPTION’ ... OK
* preparing ‘ClusterR’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘the_clusterR_package.Rmd’ using rmarkdown
Quitting from lines 189-198 (the_clusterR_package.Rmd)
Error: processing vignette 'the_clusterR_package.Rmd' failed with diagnostics:
no applicable method for 'predict' applied to an object of class "c('KMeansCluster', 'k-means clustering')"
--- failed re-building ‘the_clusterR_package.Rmd’
SUMMARY: processing the following file failed:
‘the_clusterR_package.Rmd’
Error: Vignette re-building failed.
Execution halted
and my NAMESPACE file includes the following,
# Generated by roxygen2: do not edit by hand
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
export(AP_affinity_propagation)
export(AP_preferenceRange)
export(Clara_Medoids)
export(Cluster_Medoids)
export(GMM)
export(KMeans_arma)
export(KMeans_rcpp)
export(MiniBatchKmeans)
export(Optimal_Clusters_GMM)
export(Optimal_Clusters_KMeans)
export(Optimal_Clusters_Medoids)
export(Silhouette_Dissimilarity_Plot)
export(center_scale)
export(distance_matrix)
export(external_validation)
export(plot_2d)
export(predict.GMMCluster)
export(predict.KMeansCluster)
export(predict.MedoidsCluster)
export(predict_GMM)
export(predict_KMeans)
export(predict_MBatchKMeans)
export(predict_Medoids)
import(gtools)
importFrom(Rcpp,evalCpp)
importFrom(ggplot2,aes)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,scale_shape_manual)
importFrom(ggplot2,scale_size_manual)
importFrom(ggplot2,theme)
importFrom(gmp,as.bigz)
importFrom(gmp,asNumeric)
importFrom(gmp,chooseZ)
importFrom(grDevices,dev.cur)
importFrom(grDevices,dev.off)
importFrom(graphics,abline)
importFrom(graphics,axis)
importFrom(graphics,barplot)
importFrom(graphics,legend)
importFrom(graphics,lines)
importFrom(graphics,mtext)
importFrom(graphics,par)
importFrom(graphics,plot)
importFrom(graphics,text)
importFrom(graphics,title)
importFrom(stats,na.omit)
importFrom(utils,globalVariables)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
useDynLib(ClusterR, .registration = TRUE)
I don't know why and you can't reproduce it although on linux
That's a bit crazy. CMD build
is not touching the namespace. So what is changing your namespace then? Didn't you say that devtools::document
generates the NAMESPACE with predict
exports in this comment?
I have a hunch. Could you add predict
to the list of imported functions from stats here.
Do you have stats package in your search path when you run devtools::document? For me stats
is always loaded by default.
devtools::document generates the NAMESPACE with predict exports in this comment?
In this comment I modified you function to make the R CMD build
work without success. Don't take this comment into account.
I have a hunch. Could you add predict to the list of imported functions from stats here.
I tried this as well but I receive the same error.
For me stats is always loaded by default.
Althouth the stats
package is a System Library
package I always deal with its functions as if it were a User Library
package. This because I receive NOTEs when I perform R CMD check --as-cran
Therefore, if functions of the stats
package exist in my code then I always include it in the imports
of the DESCRIPTION file.
Although I had the chance to go through the S3 and S4 classes in the past, I preferred in general the R6 classes due to its simplicity and familiarity with Python classes. One of the reasons that I'm now not in position to recognize the source of the problem that arises from the predict
error during the R CMD build
of the vignettes.
I submitted the updated version to CRAN (1.2.6). This was actually a problem with the R version I used. After updating to 4.1.1 the issue was resolved.
Perfectos! Happy to hear it solved by itself :)
I love this package because it's fills the bill for all clustering methods in one place. But the interface and UI is rather non-Rish.
As a starter I am adding
predict
methods for cluster objects. For this Icluster medoids silhouette
) are really non-standard and are cumbersome to work with when defining generics.If these changes are ok with you I will add tests, update vignette etc. Also, time permitting, will add print, plot and summary methods to make the UI of the package more user friendly for the seasoned R users.
With this change one can do:
instead of the current