googlegenomics / readthedocs

Documentation for the Google Genomics cookbook.
http://googlegenomics.readthedocs.org
Apache License 2.0
142 stars 48 forks source link

Many small updates. #113

Closed deflaux closed 8 years ago

deflaux commented 8 years ago

Also removed several obsolete files. The following redirects need to be configured before this change can be merged:

mbookman commented 8 years ago

LGTM

cmclean commented 8 years ago

Not a blocking comment, but I'm curious whether /includes/dataflow_on_gce_setup.rst could also be updated to suggest Java 8 rather than Java 7. The forthcoming LD pipeline uses Java 8 for writing to BigTable. Can the other pipelines be run on 8?

deflaux commented 8 years ago

Filed https://github.com/googlegenomics/readthedocs/issues/115 for Java 8.

Saving https://github.com/googlegenomics/readthedocs/issues/112 for another PR.

I set up the redirects but they do not appear to be working at the moment. (see also https://github.com/rtfd/readthedocs.org/issues/1826)

pgrosu commented 8 years ago

Nicole, Maybe an additional PR might be needed for the other points as well, so they don't get lost in the mists of time :)

~p

deflaux commented 8 years ago

@pgrosu Thanks for pointing out the lack of scaling! I filed https://github.com/googlegenomics/spark-examples/issues/82

pgrosu commented 8 years ago

Nicole, sure thing and sorry to be picky, but the above text says:

The matrix is centered, scaled, and then the first two principal components are computed for each invididual.

This suggests that you are scaling before you find that PC1 and PC2, which is confusing. Usually you determine the eigenvectors with the 2 highest eigenvalues in this particular case, and then you scale the data by multiplying with them, not beforehand. Maybe my eyes are getting tired, but I'm not sure I see where you're actually performing that in the Dataflow code. I notice that you find maxEigenvalue and secondEigenvalue, but I'm not sure where you rescale the original data with the corresponding eigenvectors.

Thanks, ~p