vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
444 stars 96 forks source link

Prepare vegan for the CRAN release 2.2-0 #47

Closed jarioksa closed 9 years ago

jarioksa commented 9 years ago

Here we list various minor issues that must be fixed before the release. The major problems have their own issue numbers.

gavinsimpson commented 9 years ago

Regarding the first point, and perhaps this is what you mean, I envisaged creating a 2.2 branch and tagging the code base with the 2.2 tag to indicate the release. Any improvements we need to make to 2.2 go on the 2.2 branch. All other development goes on in separate branches that are merged with master when we feel they are ready. If this is what you had in mind, consider me +1 on this point.

jarioksa commented 9 years ago

That was my vision, but I'm open to other views.

psolymos commented 9 years ago

Thanks Jari for the update.

I went back and checked what scoverage does. I think that was written when I implemented some of Anne Chao's work and we thought this might be used for other diversity metrics (it corrects the bias for unseen species based on singletons). We than decided not to use it in diversity calculations, so the function can safely be dropped.

Cheers,

Peter

Péter Sólymos 780-492-8534 | solymos@ualberta.ca | http://psolymos.github.com Alberta Biodiversity Monitoring Institute http://www.abmi.ca Boreal Avian Modelling Project http://www.borealbirds.ca

On Wed, Oct 1, 2014 at 1:37 AM, Jari Oksanen notifications@github.com wrote:

Here we list various minor issues that must be fixed before the release. The major problems have their own issue numbers.

  • Decide how we do it: in R-Forge we had a separate branches/2.0 branch for releases, and we merged from the main (development) branch into it. The argument was that we could be freer with development and maintain a stable release. The same argument would apply here as well, and branching is easier and cheaper in git. The idea was also that minor releases (2.0-1, 2.0-2 etc) could be made by gradual merging, but major releases would directly fork from the main (development) branch. We also edited the release branch to differ from the main branch: for instance, tests/ were removed.
    • Write NEWS
  • Decide what to do with scores.hclust (remove?)
  • scoverage needs better documentation (also on justification), or then it can be dropped. We had some discussion about this with @psolymos https://github.com/psolymos years ago, but I have to refresh my memory (also on the method).
  • permutest.betadisper(..., pairwise=TRUE) bug must be fixed. This is discussed in commit a4793c4 https://github.com/vegandevs/vegan/commit/a4793c444681b44b48e3297f835f45e78d7b255c. Fixed with 47f3482 https://github.com/vegandevs/vegan/commit/47f348283bff6c9d8939b2780b62fcde15e1ed84 .
  • remove u.eig and other .eig items from cca result object and test that other functions still work after this. This seems to be doable within *vegan, but there are 60 packages depending on, importing or suggesting vegan in CRAN (as of Sep 30, 2014). The only case where vegan functions still use *.eig are imaginary axes in capscale (i.e,, those associated with negative eigenvalues). These could be left as such.
  • update man/permutations.Rd to the new permute infrastructure (describes the old way).
  • some anova.cca* cases fail with missing data. Some of these could be fixed by listwise deletion of observations: do so or clearly decide to delay till later (minor) release.
  • ordiconsensus of @guiblanchet https://github.com/guiblanchet. This is a separate body of code and could be cleanly merged. What worries me is that there are more than 1000 lines of code and 14 new files. I would like to have a better grasp of the code before going to a merge. Moreover, it should be checked if this body of code should be veganified. Some functions look pretty long and seem to have pieces that could be independently usable, and then the vegan style is to allow user access. Further, it seems to introduce new community null models, but these should be made to fit in the vegan nullmodel framework. We do not have only nullmodel here, but also simulate.rda etc use that framework.

— Reply to this email directly or view it on GitHub https://github.com/vegandevs/vegan/issues/47.

jarioksa commented 9 years ago

I remember so much that I think the idea was interesting. However, we should either elaborate the issue or have a proper documentation hinting to the users what to do with this… It seems that this was implemented in 40a4924d together with clamtest upgrade, and man/clamtest.Rd says under argument coverage.limit that "Sample coverage is calculated separately". However, scoverage function is used nowhere in vegan (in particular, not in clamtest()).

guiblanchet commented 9 years ago

There are quite a bit of code in there and I agree that some might not be necessarily of interest to vegan users.

In all of the code I proposed, I think the most important one are the one that directly relate to consensus RDA. As such, RV which calculates the RV coefficient, coeffCompare which is used to compare association (dissimilarity) coefficients when used with in the RDA framework and consensusRDA which is the function that makes the consensus of RDAs. These are the most important functions with regard to consensus RDA.

RV is a small and quite simple function that I do not think needs to be veganified.

coeffCompare uses RV and spantree and I think it uses a "classique" vegan approach to get the right pieces out.

consensusRDA takes as input a list which contains results of either rda or capscale to make the consensus of RDAs. It also requires a matrix of explanatory variables for the consensus and sometimes of the matrix of response variables especially to make reprojections when using capscale. The way the matrices of response and explanatory variables have been named is not very typical of the vegan syntaxe and can be changed so that the user can more easily link the code with rda, c/a and cap scale.

As for the other functions, SADbin is a flexible function that constructs species-abundance distribution in such a way that many types of binning can be applied. If you think this is not interesting for a vegan user, it can be discarded. Similarly, simSADcomm is the function I used to simulate data in the consensus RDA paper. This one can also be discarded if you do not want to have any simulation functions in vegan.

I also included the beetle and the beetle.expl data which I used in the paper to illustrate consensus RDA. Again, this dataset can be discarded. All that would need to be done is to use one of the other dataset already in vegan to run the example in the help file.

I hope this clarify things a little.

jarioksa commented 9 years ago

Hi @guiblanchet , I have just had a first look at the ordiconsensus code base, but my view of the core functions is the same as you explained above. However, I haven't had time to have a look at them very closely. I noticed some tiny things that could be easily veganized (for instance, calculation of chord distances in examples where you can use decostand, etc). We need to have a look at your core functions and to see how to incorporate them.

A few words about the non-core functions: vegan more or less has SADbin -- it is called as.preston and implements two of your three methods of binning data into "octaves". Compare the following:

> library(ordiconsensus)
> data(BCI)
> SADbin(colSums(BCI), method="log")$bin
 [1] 19 13 14 18 30 34 31 26 18 13  7  2
> as.preston(colSums(BCI), tiesplit=FALSE)
 0  1  2  3  4  5  6  7  8  9 10 11 
19 13 14 18 30 34 31 26 18 13  7  2 
> SADbin(colSums(BCI), method="modhalflog")$bin
 [1]  9.5 16.0 18.0 19.0 30.0 35.0 31.0 26.5 18.0 13.0  7.0  2.0
> as.preston(colSums(BCI), tiesplit=TRUE)
   0    1    2    3    4    5    6    7    8    9   10   11 
 9.5 16.0 18.0 19.0 30.0 35.0 31.0 26.5 18.0 13.0  7.0  2.0 

The as.preston function only returns named non-zero octaves, and it does not implement SADbin(..., method = "modlog"), but otherwise the results are similar. If we want to have "modlog", it should be added to as.preston. SADbin also returns a matrix of species allocation to bins that is missing in as.preston. Such infromation could be added, but rather as a named vector of membership classes for species.

As to simulation: yes, we do not have community simulation in vegan. We do have nullmodel simulation, but it is always based on observed data matrix (see commsim and nullmodel) or results on observed data matrix (see simulate.rda, simulate.cca, simulate.capscale) -- all these in the unreleased to-be-2.2-0 version -- not yet in 2.0-10. I haven't had a closer look at simSADcomm so that I do not know how it is positioned here. We do have quantitative commsim models, thanks to @psolymos , that maintain species counts and marginal totals, but these are all based on observed communities and do not generate communities ex nihilo. That is, they still have species names and you can compare your observed species to those species, and the same for sites. If you have ex nihilo simulation models, they are a possible fit to coenocliner of @gavinsimpson although I think he's dealing with a different class of simulation models. I would not like open up the door in vegan for such ex nihilo models: a huge set of models could be considered, and they should be in a dedicated package.

guiblanchet commented 9 years ago

Hi @jarioksa !

Great ! The changes you propose to veganify the code for the core functions seems quite straight forward and something I should be able to do quite easily.

As for as.preston, I was not aware of this function... This would have saved me some trouble a few years back !! I think it might be interesting to include the "modlog" of SADbin because it is the one suggested by Gray et al. (2006). However, deciding on how you want to include this in as.preston requires a little bit of thinking because the users would have to decide among three choices. This is why I chose to give name to the different choices in my SADbin function. Another difference between SADbin and as.preston is that in SADbin the user can define the log base, whereas in as.preston only a log base of 2 is used. Frankly, I do not think it is necessary to include this generality, however, if you think it is of interest, it should be pretty straight forward to make the modification in as.preston.

As for simSADcomm, I agree with you that it would fit much better in a package dedicated to data simulations and does really fit particularly well in vegan. Lets not have it in vegan.