kausmees / GenoCAE

Convolutional autoencoder for genotype data
BSD 3-Clause "New" or "Revised" License
15 stars 10 forks source link

Request: add a toy model setup #25

Closed richelbilderbeek closed 2 years ago

richelbilderbeek commented 2 years ago

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

Thanks for GenoCAE and the experimental Pheno branch!

What I would enjoy is a toy Mx model (e.g. M0) and a toy px model (e.g. p0) that would be the smallest neural network possible, respecting the dimensions of the input and output (or: 'they just work' (although their predictions will be bad)).

I have tried modifying the /models/M1.json and /models/p2.json files (the latter only available on the Pheno branch), but I feel this will take you seconds to create.

I would enjoy this as this would speed up my GitHub Actions test suite: now training alone takes 150 seconds, whereas I am (usually) only able in that it creates some files, not the output being useful (for useful output I would use the regular models).

Would it be easy to add toy models Mx (e.g. models/M0.json) and toy model px (e.g. models/p0.json)?

If I underestimate how hard this is, just let me know, and I will try harder :-)

Thanks and cheers, Richel

kausmees commented 2 years ago

Hey

I have now added a model M0 on the master branch that is faster to train than the previous models. Running a command to train 1 epoch takes ~10s on my laptop with it, so hopefully it will speed things up a little. The model isn't really a 'smallest possible' and could be scaled down more, but since the majority of the time is spent on loading and preprocessing the data, I don't think a smaller one would matter so much for total runtime at this point.

We are working on speeding up the data loading process, hopefully that feature will be added soon.

I can have a look at making a p0 model soon as well.

Best, K

richelbilderbeek commented 2 years ago

@kausmees that is great, thanks so much, I will try it out soon!

but since the majority of the time is spent on loading and preprocessing the data

Would that be true for a simulated dataset of 3 individuals with 1 SNP and phenotype as well? Those are what I use for testing :-)

richelbilderbeek commented 2 years ago

@kausmees that new model is great! It brought a full GitHub Actions test run to 3 minutes!

Happily closing this Issue :+1: