karkaroff / castor

PyTorch deep learning models by the Data Systems Group at the University of Waterloo
http://castor.ai/
Apache License 2.0
0 stars 0 forks source link

Optimize dataset preprocessing for CharCNN #12

Open achyudh opened 5 years ago

achyudh commented 5 years ago

CharCNN runs out of system memory for large datasets like IMDB and Yelp as Castor does all of the pre-processing for the entire dataset at once and stores it in memory. Making the character quantization a part of the model, rather than dataset pre-processing would only quantize one batch at a time.