torch / torch7

http://torch.ch
Other
9k stars 2.38k forks source link

access time #903

Open xmb-cipher opened 7 years ago

xmb-cipher commented 7 years ago

I have a big binary file (almost 2GB) containing float32. I load it by t = torch.FloatTensor(torch.FloatStorage(filename))

I will keep accessing this big tensor for 1 to 2 hours when executing my program. I observed that it's very slow for the first 10 to 20 minutes.

Can anyone explain why and provide some advice?

Thanks

apaszke commented 7 years ago

Because memory mapping a file doesn't actually load it into the memory unless you touch the pages. It probably takes 10-20mins for your access pattern to touch most of them, and only then will they remain cached.

xmb-cipher commented 7 years ago

@apaszke Thanks for your quick response. Any suggestion how to rewrite this so that it's completely loaded in memory at the beginning?

apaszke commented 7 years ago

TH doesn't support MAP_POPULATE, so I guess your best bet is to read a couple of bytes every 4KB stride (page size) from the whole file when you start your script. This should touch all the pages in the mapped region (if you have sufficient memory and they won't get swapped out).