Closed summerghw closed 8 months ago
The annoy recall is definitively not the same. Did you set the seed in order to tend to be reproducible?
yes, I set the seed
my.seed <- 202106L
There is no connection to the internet required. No network communication of any kind should be happening.
I assume the various github actions for testing R packages make use of containers, so there shouldn't be a problem with using uwot with docker. All I can think of with the information provided is:
Writing NN index file to temp file /tmp/
indicates). Are you sure that both hosts are set up to provide this storage in the same way (e.g. same permissions, same amount of space)? There could be failures here where I have failed to detect these states and not provided an appropriate error message. In my own experience with getting containers to read and write data to host storage (albeit unrelated to uwot or R), I had to be quite careful with user permissions and matching user and group ids between host and container. But that was a few years ago.As an aside, the recall value you get (0.2) in the first case where things seem to be working might be a bit worrying: it means that for 80% of the observations in your dataset, they fail to find themselves as their own nearest neighbor. Either the nearest neighbor search is failing (could be due to not enough trees or too low a search_k
value) or you have a lot duplicates. If you aren't expecting duplicates in your data, it's worth investigating that before proceeding.
@jlmelville thank you for reply.
Hey, I wander weather R package uwot need connecting internet to fuction. I run same data in same docker container in two computer, one did not connecting to internet, a error occur:
08:54:07 Writing NN index file to temp file /tmp/RtmpO90Kgu/file556f6f9565 08:54:07 Searching Annoy index using 1 thread, search_k = 3000 08:54:11 Annoy recall = 0.2088% 08:54:12 Commencing smooth kNN distance calibration using 1 thread 08:54:12 14365 smooth knn distance failures Error in x2set(Xsub, n_neighbors, metric, nn_method = nn_sub, n_trees, : Non-finite entries in the input matrix
The program runs no problem on the computer which connecting to internet, the log is below:08:53:01 Writing NN index file to temp file /tmp/Rtmpk0B9KK/file139c1144b046 08:53:01 Searching Annoy index using 1 thread, search_k = 3000 08:53:06 Annoy recall = 100% 08:53:07 Commencing smooth kNN distance calibration using 1 thread 08:53:09 Initializing from normalized Laplacian + noise 08:53:10 Commencing optimization for 200 epochs, with 583328 positive edges 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **************************************************| 08:53:18 Optimization finished