thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
168 stars 64 forks source link

Inputs for mapping utility #181

Closed jrdupuis closed 7 years ago

jrdupuis commented 7 years ago

@thibautjombart

Here’s an example output that would be read into our mapping tool: webapp_input.csv It would get created from the attached .rds and localities.csv file.

If the column headers stay the same, then the mapping tool automatically displays PC1 vs PC2, colors by the assigned population, etc. Many of the column headers should be the same for PCA/MDS (except for assigned group, posterior probs, etc. that are more specific to DAPC).

The only other thing to keep in mind is that localities.csv might contain a bunch of other columns other than key, lat, and lon (“Population” for this dataset is an example). So hopefully the R commands will accommodate extra stuff.

Let me know if you have any other questions!

Cheers, Julian Rosenberg2005_toThibaut.zip

thibautjombart commented 7 years ago

Okay thanks, will give it a go today. Is it assumed that there is always a 'population' / grouping factor for the individuals?

thibautjombart commented 7 years ago

First pass at it at 6bca108821618e489a05638d5bf3838eb5710b80 See example in ?export_to_webapp. The first argument is an analysis (so far: DAPC, spca, or regular dudi which includes pca, mds, ca, and others). The second is info, a data.frame which contains at least: key, lat, lon. Individuals don't need to be in the same order as in the analysis, but a warning will be issued if some individuals in the analysis are missing in info.

jrdupuis commented 7 years ago

@thibautjombart This looks great! I just tested it out with DAPC, PCA, and sPCA, and everything looks good.

Do you think there are any other analysis-specific data objects that we could include from any other multivariate analyses? I'm thinking about things like assigned_grp <- x$assignandsupport <- apply(x$posterior, 1, max)` that are DAPC-specific: Are there any other aspects of sPCA, etc. that could be easily integrated into the mapping utility?

Plotting the actual eigenvectors would be great, but I think we'll hold off on that for now. Maybe in v2.0 or something...

I think we all like the name mvmapper as well, so maybe the function could also include that name: export_to_mvmapper ?

thibautjombart commented 7 years ago

Good! :) I tried adding relevant & specific stuff whenever useful. For DAPC, all we need really besides the principal components are the groups used in the analysis, the assigned groups, and their statistical support. For sPCA, there are lag principal components, which basically allow better visualisation of positive autocorrelation (the lag operator computes, for each individual, the average score of the neighbouring individuals).

We cover quite a bit of ground as it is, with methods for dapc, spca, and dudi, which itself covers most methods in ade4. But adding new methods for new classes of object will be very straightforward.

+1 to renaming to export_to_mvmapper

thibautjombart commented 7 years ago

Question: what URL and references shall we add to the doc of export_to_mvmapper?

thibautjombart commented 7 years ago

Renaming done at: 80293d9 Also the doc is now more complete:

export_to_mvmapper          package:adegenet           R Documentation

Export analysis for mvmapper visualisation

Description:

     ‘mvmapper’ is an interactive tool for visualising outputs of a
     multivariate analysis on a map from a web browser. The function
     ‘export_to_mvmapper’ is a generic with methods for several
     standard classes of analyses in ‘adegenet’ and ‘ade4’. Information
     on individual locations, as well as any other relevant data, is
     passed through the second argument ‘info’.

Usage:

     export_to_mvmapper(x, ...)

     ## Default S3 method:
     export_to_mvmapper(x, ...)

     ## S3 method for class 'dapc'
     export_to_mvmapper(x, info, ...)

     ## S3 method for class 'dudi'
     export_to_mvmapper(x, info, ...)

     ## S3 method for class 'spca'
     export_to_mvmapper(x, info, ...)

Arguments:

       x: The analysis to be exported. Can be a ‘dapc’, ‘spca’, or a
          ‘dudi’ object.

     ...: Further arguments to pass to other methods.

    info: A ‘data.frame’ with additional information containing at
          least the following columns: ‘key’ (unique individual
          identifier), ‘lat’ (latitude), and ‘lon’ (longitude). Other
          columns will be exported as well, but are optional.

Value:

     A ‘data.frame’ which can serve as input to ‘mvmapper’, containing
     at least the following columns:

        • ‘key’: unique individual identifiers

        • ‘PC1’: first principal component; further principal
          components are optional, but if provided will be numbered and
          follow ‘PC1’.

        • ‘lat’: latitude for each individual

        • ‘lon’: longitude for each individual

     In addition, specific information is added for some analyses:

        • ‘spca’: ‘Lag_PC’ columns contain the lag-vectors of the
          principal components; the lag operator computes, for each
          individual, the average score of neighbouring individuals; it
          is useful for clarifying patches and clines.

        • ‘dapc’: ‘grp’ is the group used in the analysis;
          ‘assigned_grp’ is the group assignment based on the
          discriminant functions; ‘support’ is the statistical support
          (i.e. assignment probability) for ‘assigned_grp’.

Author(s):

     Thibaut Jombart <email: thibautjombart@gmail.com>

Examples:

     data(sim2pop)

     dapc1 <- dapc(sim2pop, n.pca = 10, n.da = 1)

     info <- data.frame(key = indNames(sim2pop),
                        lat = other(sim2pop)$xy[,2],
                        lon = other(sim2pop)$xy[,1],
                        Population = pop(sim2pop))

     out <- export_to_mvmapper(dapc1, info)
     head(out)

     data(rupica)

     spca1 <- spca(rupica, type=5, d1 = 0, d2 = 2300,
                   plot = FALSE, scannf = FALSE,
                   nfposi = 2,nfnega = 0)

     info <- data.frame(key = indNames(rupica),
                        lat = rupica$other$xy[,2],
                        lon = rupica$other$xy[,1])

     out <- export_to_mvmapper(spca1, info)
     head(out)
jrdupuis commented 7 years ago

@thibautjombart This looks excellent. As for references/URL, the genomeannotation github is Scott's lab github, so that should be the final GitHub url: https://github.com/genomeannotation/mvMapper

For references, do you mean other stuff we should reference in the doc? Or a reference for the paper describing the method? If the latter, maybe we can update that when we get to manuscript accepted phase.

thibautjombart commented 7 years ago

I added the URL to the doc. We can add further refs (paper etc) later as things unfold. If it is all good, let me know and I'll close this issue. Might want to open another one to add a tutorial illustrating briefly how to export an analysis and use it in mvmapper.

jrdupuis commented 7 years ago

Sounds great. I think you can go ahead and close issue at this point. We just got the VM up and running for the web server, so hopefully it'll be a quick turnaround to get the whole thing operational. And at that point we can flesh out the tutorials for using the web version and the docker container. I'll be in touch!

thibautjombart commented 7 years ago

Awesome. Speak soon!