Closed heavywatal closed 3 months ago
Sorry, my bad. I was doing wrong on initialization of Rcpp::IntegerVector
. Now it accepts NA_integer_
as expected.
With this modification, it becomes possible to do this:
set.seed(1L)
KMeans_arma(iris[,-5], 3L, seed = NA_integer_)
which generates exactly the same result as this:
KMeans_arma(iris[,-5], 3L, seed = 1L)
It is also possible to do the same thing in Rcpp level.
@heavywatal
Thanks for the pull request and the additional functionality. A few requests from my side:
OK, for now, I have added the minimal set of tests to validate the modification.
seed=
parameter?Do you really think it is necessary to repeat this for every function with seed= parameter?
We have to make sure the previous version of the ClusterR and the updated version using your adjustments in the code do not give different results. Therefore, we have to test the functions which have as an argument the "seed"
Sorry I have never heard of ASAN or UBSAN
If you use compiled code in an R package then the code is tested for "Undefined Behaviour". Your adjustment which is the following line
if (Rcpp::all(Rcpp::is_na(Rcpp::IntegerVector(seed)))) return;
Will take an NA (missing value) as input and this might lead to undefined behaviour (or might not - I don't know that, because I don't know the internals of the Rcpp package).
I'll tell you from personal experience that I spent many hours in the past to fix errors from CRAN related to ASAN, UBSAN and avoid the exclusion of the package from CRAN.
We have to make sure the previous version of the ClusterR and the updated version using your adjustments in the code do not give different results.
I appreciate the purpose, but partially disagree on the conclusion:
Therefore, we have to test the functions which have as an argument the "seed"
because it should have been logically proven already by the combination of the existing tests and my additional tests. Is there any ClusterR function that uses seed
in a different way from KMeans_arma()
? Now that KMeans_arma()
passes the tests, is it possible for KMeans_rcpp()
, for example, to fail in terms of set_seed()
modification? It seems to me that repetitive tests will just increase the code complexity uneconomically without much benefit.
But, OK, you are the author of this package, and I will add some tests for the other functions in the near future.
I'll tell you from personal experience that I spent many hours in the past to fix errors from CRAN related to ASAN, UBSAN and avoid the exclusion of the package from CRAN.
Thank you for the effort. You saved my day(s). Is it possible that we skip our local ASAN/UBSAN tests and give it a shot to let CRAN do it (hopefully automated)?
Thank you for the effort. You saved my day(s)
By "spent time" I meant to debug previous error cases related to ASAN, UBSAN and not related to your current additions in the .cpp files. This is something you have to test/check using the existing packages such as rhub or any packages that you think it can serve the purpose.
Is it possible that we skip our local ASAN/UBSAN tests and give it a shot to let CRAN do it (hopefully automated)?
I'm sorry I can not do that. If you are not currently in place to perform the test/check then feel free to close the PR and re-open or create a new one once you are ready. thanks
I'll close the PR for now. Feel free to re-open or open a new one.
This should address #52 but currently it does not work because
NA_integer_
is implemented as the maximum negative value, and a negative seed causes an error "negative length vectors are not allowed".base::set.seed()
accepts negative seeds without any problem,which indicates ClusterR is doing something wrong with negative seeds.