Open xljhtq opened 6 years ago
I think one solution is to modify the "InstanceBatch" class in "SentenceMatchDataStream.py". Right now, my code will load all data into memory and pad all variables beforehand (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchDataStream.py#L165). However, the padding part will cost a lot of memory.
One way to fix this is that don't pad variables while loading all data, but conduct the padding procedure right before you use it. This line (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchTrainer.py#L92) may be a good position to insert your padding function.
When I load the file with many data, I have met with a problem. The free memory will be smaller and smaller because of the exitence of sorting algorithm in the preprocessing step. What should I do to optimize it ?