Open Lswhiteh opened 4 years ago
Hey @Lswhiteh. Yeah we have used DataGenerators for other projects and would definitely welcome PRs here!
In this particular project I think the problem with memory is not the batch size per se, but instead the size of the individual tensors being passed to the GPU. Is that correct @cjbattey?
Yeah the memory issue I ran in to (ie with running > a million SNPs) were in loading tensors for a single minibatch into GPU memory, so I'm not sure if a generator would help there. Definitely open to any PR's testing it though!
Sounds good. I'm waiting for another model to finish training so I might fiddle around with it in my free time. I'm also not sure if generators would help in that case, but worth a shot since it's already written.
I'll let you know how it goes!
right on. thanks!
Hi there, was looking through your codebase and paper and after your mention of data being too large to load into memory, I was wondering if you had heard of/tried Keras DataGenerators. They stream data on the fly, and you can customize them quite a bit.
I have some custom generators built for some other genetics deep learning networks I'm working on, would you mind if I submitted a pull request in the next few days after testing if a generator works? I might also functionalize the script while I'm at it so it can be more modular for the actual generation process.