CAT 2000 Dataset/Memory Use Issues

Hamcastle commented 5 years ago

Hi Matthias,

Nice work -- this package's great.

Having an issue with the code for the CAT 2000 training data set/the package's memory footprint.

I've created some saliency maps for the CAT 2000 training data outside pysaliency itself.

I can load the stimuli and fixations for the set from a local copy using the "pysaliency.get_cat2000_train()" command.

The saliency maps are organized into the same folder structure as the source data, with each saliency map contained in its category specific folder.

If I try to load them from the folder containing these category specific sub-folders using the "pysaliency.SaliencyMapModelFromDirectory" command, I get the error in the attached.

Looks like this comes from the fact that this function doesn't work recursively. I can get around this by using the "pysaliency.SaliencyMapModelFromFiles" function with a list of paths to each salience map.

The trouble is that this seems to use quite a bit of memory.

My computer has 32gb of ram and a 32gb swap partition. Calculating the Judd-AUC score using the "SaliencyMapModelFromDirectory" function for the MIT 1003 dataset consumes about 8.1gb. I note also that even after the score is calculated, the additional memory used is not released without restarting the ipython kernel.

If I try to calculate the same score for the CAT 2000 training set using the "SaliencyMapModelFromFiles" function, it fills both the ram and the swap partition completely, causing the ipython kernel to die.

Could you make a recommendation on how to work with this dataset in a slightly more memory efficient way? Do you have a sense of what might be responsible for the memory use issue otherwise?

In case you think this might be a system/python environment specific issue, here are (I think all of) the relevant specs:

OS: Ubuntu 16.04 LTS Python environment:

Anaconda
Python 3.5.5
Numpy 1.14.5
imageio 2.3.0
boltons 18.0.0
scipy 1.1.0
pysaliency built and installed from source using the version hosted here (looks like I cloned the repo on September 24, 2018, so the state of the codebase is at commit: 6d9c394

from_directory_error

Thanks again!

matthias-k commented 5 years ago

Hi Dylan,

thanks for reporting this bug to me! In October in https://github.com/matthias-k/pysaliency/commit/4d014c4c0c8352d3e730d89e3492dd6d6c11e211 I implemented nested directories for HDF5 models but apparently I forgot to do so for directory based models. I'll fix this over the next days. By the way, pysaliency changed quite a bit since September 2018, so it might be worth updating :).

Regarding your memory issue: pysaliency uses a caching mechanism for keeping saliency maps in memory in order to avoid having to recompute them all the time. I admit that for file based models that might usually be unnecessary and I should change the default in those cases. You can always disable the caching using caching=False as keyword argument for the model constructor as in SaliencyMapModelFromFiles(stimuli, filenames, caching=False).

Hamcastle commented 5 years ago

Matthias,

Thanks for the very speedy reply. Setting the caching flag to False solved the issue. Will do on the update! Feel free to mark closed, unless you want to wait for whatever changes you end up making :P

matthias-k / pysaliency

CAT 2000 Dataset/Memory Use Issues #9