googlegenomics / cloudml-examples

Examples of using CloudML with genomic data.
Apache License 2.0
18 stars 11 forks source link

Where could we subset the SNP data? #6

Closed ljstrnadiii closed 6 years ago

ljstrnadiii commented 7 years ago

Thanks for building this model!

I was wondering if it would be trivial to subset the SNP data by specifying the Loci of the data we wanted to train on. Any idea where to start?

Also, any plans to actually implement the DietNet? It seems that so far the model structure is simply: X -> hidden layer (p x #hidden) -> softmax (#hidden x class) -> entropy loss

In the README is says that it is similar to the DietNet, but the dietnet learns weight matrices << p. Any plans to build an encoder on the data transposed? Any plans to also implement the reconstruction error like the dietnet?

Thanks in advance!

deflaux commented 7 years ago

Hello @ljstrnadiii !

To subset the SNP data, you can write a new query similar to 1000_genomes_phase3_b37_snps_only.jinja and JOIN against a table holding the desired SNP locations.

As you noticed, this is not an implementation of what is described in Diet Networks but we thought it might be useful to refer to that paper since it seems relevant.