Closed KasperSkytte closed 5 years ago
Thank you for using RTK. Glad you find it useful.
Yeah, this would require some changes within the C++ code. I think this could be a minor changed though, but I am not sure how much the performance would increase.
I quickly tested it on a toy data set and see no immediate speed improvement. I only removed the computation of the diversity from the rarefaction function, without allocating the memory for the diversity and by removing the output functions, this might increase slightly, but I think the overhead is quite small.
Hi again!
Thank you for your prompt response and toy test. But would you be willing to test on some larger data? I am primarily working with some quite huge data, in the size of 100000's of taxa and thousands of samples, so any speed increase anywhere is welcome. If it's not too much work for you, I'd appreciate a minimal rarefy only function in C++, and perhaps also implement it in https://github.com/madsalbertsen/ampvis2, vegan's rrarefy
function is just too slow. Else I'll gladly explore your C++ code more to isolate rarefaction only, and of course cite RTK.
for making rarefaction curves mainly
Hey Kasper, this would virtually not change performance. Diversity calulations are done after the rarefactions are calculated and it's similar to O(N), that is going one time over the vector. Rarefaction is what is taking most time. Removing it from the C++ code is possible, but I expect it to cause a headache with R integration etc and given the minor impact on performance it was a very active decission not to have an option whether it is output or not. best, Falk
Hi Falk and Paul!
Thank you for your answers. I will stick to RTK as it is now then.
Hi!
Great implementation, its very fast. But I only need to rarefy and not calculate diversity and richness measures. Is it possible to ONLY rarefy and only return the rarefied count matrix in R? I tried to fiddle a bit with the R code, but it seems I have to go into the C++ code to be able to perform rarefaction only, and I foresee I would spend a lot of time on that as I'm not a big expert in C++.
Thanks in advance