Closed antagomir closed 3 months ago
@FelixErnst - wondering if we should have a more clear default policy here - how low barrier making PRs to external packages instead or expanding mia
. Here we clearly have two options.
coda.base
seems to a single function package only, but it is using a Cpp implementation. The one fixed criteria for runMDS
is that it must use the same inputs as dist
. If that is the case, I would try to go the route of suggesting the function be added and optimized in these packages.
Maybe implement it yourself and do a benchmark to make sure, where the right place for it is.
The general idea for keeping the dependency tree small is that, if you are only importing one function an entry in suggest and a soft dependency is OK. If it is more than one function or a class, you have to import anyway. In this case I would keep the package in suggest, because of the single function aspect and the rare chance someone is using that function on their first encounter with mia
Ok I opened an issue to coda.base and to vegan to discuss this.
Given, that adding these features in the above mentioned package may take considerable time (or perhaps not?), it might be useful to have, meanwhile, our own implementation calculateAitchison
(with "standard" and "robust" options).
This can be implemented easily without extra dependencies, since we already have internal implementations .calc_clr
and .calc_rclr
in mia
. Data (counts
or relabundances
assay) has to be run transformCounts
(or transformSamples
) with the (r)clr and then just through Euclidean distance which is readily available through the standard stats::dist
and vegan::vegdist
functions.
We already have calculateUniFrac
and calculateJSD
, this would operate with the same logic.
It seems, however, that this could be added in vegan soon, see vegan issue #433. So anyone working on this should check that before further action.
Any news on this?
I now asked again in the vegan issue. Progress depends on that, at least partially.
I have now made a PR to vegan.
vegan has now merged the PR including clr
and rcrl
transformations, and the corresponding Aitchison distances.
I am suggesting a PR that will replace mia functions for clr, rclr, and other transformations that are readily available through vegan (at least hellinger).
The Aitchison distances are available through vegan through vegdist and can be readily used in runMDS
, runMDS
and other such functions via the FUN
argument. Some examples could be added.
Any news on this? How do you want to proceed?
They are just about to merge the PR which is already done and passed all checks. We added some modifications to method names per their request. I am following up the merge process in vegan, and as soon as they make the new CRAN release, I will find time to finalize these issues.
@FelixErnst - could you check the discussion on licenses at my vegan PR on decostand/vegdist, where we utilized earlier code from mia. For that code we would need to switch from Artistic2 to GPL2. OK to you?
That is OK for me. Do I need to comment there?
Let's see. @TuomasBorman can you confirm here as well that license switch is ok. You have contributed some commits to this part.
That is ok
That is OK for me. Do I need to comment there?
Let's see. Hopefully enough like this.
Ok guys, can you also confirm the license switch at this vegan issue #460 so we get fwd.. @TuomasBorman @FelixErnst
The feature is now in the vegan github main branch but it is currently unknown how fast there will be a CRAN release. We must wait for a while. Can ask then.
CRAN vegan pkg has now these things, i.e.
decostand
vegdist
-> Update mia accordingly; switch to using the vegan versions of the transformations and distances where appropriate.
-> Also remember to update manpages and unit tests
-> Consequently, update OMA as well
Also check the thread above and consider adding calculateAitchison
. However we can discuss if it is a good design choice to have separate wrapper for every distance measure. It might be useful for the most common ones but another option could be just to have generic distance wrapper, calling vegan distances and tree-based distances (e.g. Unifrac).
Should be ok now.
I propose to add a function
calculateAitchison
.This would be similar to
calculateUnifrac
but it will not require tree and it will return Aitchison distances.These are defined as Euclidean distances between clr-transformed values, by default calculated from the assay="counts" or assay="relabundances" (which should lead to same result, consider including a unit test for that..).
The user should be able to choose between standard (default; with clr) and robust (optional; with rclr) transformation as part of the distance calculation.
The advantage is to standardize this calculation, it has become a relatively common choice and fluent access is needed for comparison purposes.
The default names given for the distances would be "Aitchison distance" and "Robust Aitchison distance"
As an alternative to implementing
mia::calculateAitchison
function, one can use method "aitchison" through coda.base::dist function because in most cases a user can just call a specific external distance function for runMDS and other ordinations:d <- coda.base::dist(X, method = 'aitchison')
. The limitation with this is that the "Robust Aitchison" option is not available in standard distance functions as far as I know, this would require PR to vegan::vegdist or coda.base::dist or some other distance function, to add both Aitchison and Robust Aitchison. In principle, that would be perhaps even better solution than implementing our own versions of some rather general distances.