matthias-k / pysaliency

Python Framework for Saliency Modeling and Evaluation
MIT License
150 stars 45 forks source link

CAT 2000 Dataset/Memory Use Issues #9

Open Hamcastle opened 5 years ago

Hamcastle commented 5 years ago

Hi Matthias,

Nice work -- this package's great.

Having an issue with the code for the CAT 2000 training data set/the package's memory footprint.

I've created some saliency maps for the CAT 2000 training data outside pysaliency itself.

I can load the stimuli and fixations for the set from a local copy using the "pysaliency.get_cat2000_train()" command.

The saliency maps are organized into the same folder structure as the source data, with each saliency map contained in its category specific folder.

If I try to load them from the folder containing these category specific sub-folders using the "pysaliency.SaliencyMapModelFromDirectory" command, I get the error in the attached.

Looks like this comes from the fact that this function doesn't work recursively. I can get around this by using the "pysaliency.SaliencyMapModelFromFiles" function with a list of paths to each salience map.

The trouble is that this seems to use quite a bit of memory.

My computer has 32gb of ram and a 32gb swap partition. Calculating the Judd-AUC score using the "SaliencyMapModelFromDirectory" function for the MIT 1003 dataset consumes about 8.1gb. I note also that even after the score is calculated, the additional memory used is not released without restarting the ipython kernel.

If I try to calculate the same score for the CAT 2000 training set using the "SaliencyMapModelFromFiles" function, it fills both the ram and the swap partition completely, causing the ipython kernel to die.

Could you make a recommendation on how to work with this dataset in a slightly more memory efficient way? Do you have a sense of what might be responsible for the memory use issue otherwise?

In case you think this might be a system/python environment specific issue, here are (I think all of) the relevant specs:

OS: Ubuntu 16.04 LTS Python environment:

from_directory_error

Thanks again!

matthias-k commented 5 years ago

Hi Dylan,

thanks for reporting this bug to me! In October in https://github.com/matthias-k/pysaliency/commit/4d014c4c0c8352d3e730d89e3492dd6d6c11e211 I implemented nested directories for HDF5 models but apparently I forgot to do so for directory based models. I'll fix this over the next days. By the way, pysaliency changed quite a bit since September 2018, so it might be worth updating :).

Regarding your memory issue: pysaliency uses a caching mechanism for keeping saliency maps in memory in order to avoid having to recompute them all the time. I admit that for file based models that might usually be unnecessary and I should change the default in those cases. You can always disable the caching using caching=False as keyword argument for the model constructor as in SaliencyMapModelFromFiles(stimuli, filenames, caching=False).

Hamcastle commented 5 years ago

Matthias,

Thanks for the very speedy reply. Setting the caching flag to False solved the issue. Will do on the update! Feel free to mark closed, unless you want to wait for whatever changes you end up making :P