kr-colab / popvae

genotype dimensionality reduction with a VAE
Other
41 stars 9 forks source link

Keras DataGenerators #2

Open Lswhiteh opened 4 years ago

Lswhiteh commented 4 years ago

Hi there, was looking through your codebase and paper and after your mention of data being too large to load into memory, I was wondering if you had heard of/tried Keras DataGenerators. They stream data on the fly, and you can customize them quite a bit.

I have some custom generators built for some other genetics deep learning networks I'm working on, would you mind if I submitted a pull request in the next few days after testing if a generator works? I might also functionalize the script while I'm at it so it can be more modular for the actual generation process.

andrewkern commented 4 years ago

Hey @Lswhiteh. Yeah we have used DataGenerators for other projects and would definitely welcome PRs here!

In this particular project I think the problem with memory is not the batch size per se, but instead the size of the individual tensors being passed to the GPU. Is that correct @cjbattey?

cjbattey commented 4 years ago

Yeah the memory issue I ran in to (ie with running > a million SNPs) were in loading tensors for a single minibatch into GPU memory, so I'm not sure if a generator would help there. Definitely open to any PR's testing it though!

Lswhiteh commented 4 years ago

Sounds good. I'm waiting for another model to finish training so I might fiddle around with it in my free time. I'm also not sure if generators would help in that case, but worth a shot since it's already written.

I'll let you know how it goes!

andrewkern commented 4 years ago

right on. thanks!