rigoudyg / climaf

CliMAF - a Climate Model Analysis Framework - doc at : http://climaf.readthedocs.org/
Other
16 stars 7 forks source link

Is there something wrong with clist() or my cache ? #224

Closed jypeter closed 1 year ago

jypeter commented 3 years ago

I wanted to know a bit more about the cache, searched the documentation, and found the cache related functions in the Functions for managing/viewing results page

I have an existing cache directory, with some content, but the clist function returns nothing. Is there something wrong with what I'm doing, or clist ?

There is also a potential inconsistency in what is returned:

Checking the content of the cache by hand

[jypmce@ciclad-ng jypmce]$ du -sh /data/jypmce/climafcache
4.6G    /data/jypmce/climafcache

[jypmce@ciclad-ng jypmce]$ find /data/jypmce/climafcache -type f | wc -l
832

[jypmce@ciclad-ng jypmce]$ find /data/jypmce/climafcache -type f -mtime -1 | wc -l
272
[jypmce@ciclad-ng jypmce]$ find /data/jypmce/climafcache -type f -mtime +1 | wc -l
308
[jypmce@ciclad-ng jypmce]$ find /data/jypmce/climafcache -type f -mtime 1 | wc -l
252

Using clist to check the cache

[jypmce@ciclad-ng ~]$ module load climaf/2.0.0-python3.6
Loading climaf/2.0.0-python3.6
  Loading requirement: nco/4.5.2 ncl/6.3.0 cdo/1.9 netcdf4/4.3.3.1-gfortran
[jypmce@ciclad-ng ~]$ python
Python 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:09:42)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from climaf.api import *
python => 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:09:42)
[GCC 7.5.0]
---
Required softwares to run CliMAF => you are using the following versions/installations:
ncl 6.6.2 => /modfs/modtools/miniconda3/envs/analyse_3.6_test/bin/ncl
cdo 1.9.6 => /opt/nco/1.9/bin/cdo
nco (ncks) 4.5.2 => /opt/nco-4.5.2/bin/ncks
ncdump such => /prodigfs/ipslfs/dods/jservon/miniconda/envs/cesmep_env/bin/ncdump
Check stamping requirements
nco (ncatted) found -> /opt/nco-4.5.2/bin/ncatted
convert found -> /usr/bin/convert
pdftk found -> /usr/bin/pdftk
exiv2 found -> /ciclad-home/jservon/Evaluation/CliMAF/climaf_installs/climaf_V2.0.0/bin/exiv2
---
CliMAF version = 2.0.0
CliMAF install => /ciclad-home/jservon/Evaluation/CliMAF/climaf_installs/climaf_V2.0.0
Cache directory set to : /data/jypmce/climafcache (use $CLIMAF_CACHE if set)
Cache directory for remote data set to : /data/jypmce/climafcache/remote_data (use $CLIMAF_REMOTE_CACHE if set)
Available macros read from ~/.climaf.macros are : []
>>> clist()
Content of CliMAF cache
[]
>>> clist(age='+1')
>>> clist(age='-1')
>>> clist(age='+1', count=True)

>>> type(clist())
Content of CliMAF cache
<class 'list'>

>>> type(clist(age='+1'))
<class 'NoneType'>

>>> cwc()
Content of CliMAF cache
[]
>>> cwc(age='+1')
>>> cwc(age='-1')

Playing with craz

I have called craz() after doing the tests above. If I call it with no arguments, it just lists the content of the cache directory. That's not in the craz documentation, but why not. The size of my cache directory remains unchanged (4.6G). I have checked my scripts and I was not using cprotect

If I use craz(force=True), the cache is cleaned. I have checked my scripts and I was not using cprotect

>>> craz()
00  13  24  35  46  57  68  79  8a  9c  ad  bf  climaf_mcdoemk7aoe7  df  f1
01  14  25  36  47  58  69  7a  8b  9d  ae  c0  climaf_mcdoxzhlte3v  e0  f2
02  15  26  37  48  59  6a  7b  8c  9e  af  c1  d0                   e1  f3
03  16  27  38  49  5a  6b  7c  8d  9f  b0  c2  d1                   e2  f4
04  17  28  39  4a  5b  6c  7d  8e  a0  b1  c3  d2                   e3  f5
05  18  29  3a  4b  5c  6d  7e  8f  a1  b2  c4  d3                   e4  f6
06  19  2a  3b  4c  5d  6e  7f  90  a2  b3  c5  d4                   e5  f8
08  1a  2b  3c  4d  5e  6f  80  91  a3  b5  c6  d5                   e6  f9
09  1b  2c  3d  4e  5f  70  81  92  a4  b6  c7  d6                   e8  fa
0a  1c  2d  3e  4f  60  71  82  93  a5  b7  c8  d7                   e9  fb
0b  1d  2e  3f  50  61  72  83  94  a6  b8  c9  d8                   ea  fc
0c  1e  2f  40  51  62  73  84  95  a7  b9  ca  d9                   eb  fd
0d  1f  30  41  52  63  74  85  96  a8  ba  cb  da                   ec  fe
0e  20  31  42  53  64  75  86  98  a9  bb  cc  db                   ed  ff
0f  21  32  43  54  65  76  87  99  aa  bc  cd  dc                   ee  index
10  22  33  44  55  66  77  88  9a  ab  bd  ce  dd                   ef
11  23  34  45  56  67  78  89  9b  ac  be  cf  de                   f0

>>> craz(force=True)
>>> craz()
>>>
jypeter commented 3 years ago

OK, I have run my script again, after calling craz(force=True) above, and I'm now getting results when calling clist! I guess something was wrong in my cache.

>>> res_m1 = clist(age='-1')
Filtered objects = cache content
>>> type(res_m1)
<class 'list'>
>>> len(res_m1)
624

I have just read the csync documentation. My test above had not created any new content in the file (I just imported CliMAF) and I had nothing to sync. But would csync(update=True) could have updated/corrected my cache? It's too late to test that now

Note: it may be possible to improve the results of the cache related functions, as described in #226

jypeter commented 3 years ago

I'm completely lost!

I got the results above (624 crs in the cache) when staying in the script (I executed python -i script.py to stay in interactive mode after script execution) and playing with the cache

I ran the same script with different parameters, but my script does not use cache controlling functions and I got a smaller content, when I thought I would get something bigger. The cache should keep on growing, unless you reset it, right?

>>> len(clist())
Content of CliMAF cache
336
>>> csync()
>>> len(clist())
Content of CliMAF cache
336

I then exited my script and started a clean python where I just imported CliMAF. Now if I call clist, I get nothing, just like at the beginning of this issue! Does clist() give you valid information only about the current script??? This would be counter-intuitive as a cache is supposed to be persistent, by definition

[jypmce@ciclad-ng PMIP4]$ python
Python 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:09:42)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from climaf.api import *
python => 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:09:42)
[GCC 7.5.0]
---
Required softwares to run CliMAF => you are using the following versions/installations:
ncl 6.6.2 => /modfs/modtools/miniconda3/envs/analyse_3.6_test/bin/ncl
cdo 1.9.6 => /opt/nco/1.9/bin/cdo
nco (ncks) 4.5.2 => /opt/nco-4.5.2/bin/ncks
ncdump such => /prodigfs/ipslfs/dods/jservon/miniconda/envs/cesmep_env/bin/ncdump
Check stamping requirements
nco (ncatted) found -> /opt/nco-4.5.2/bin/ncatted
convert found -> /usr/bin/convert
pdftk found -> /usr/bin/pdftk
exiv2 found -> /ciclad-home/jservon/Evaluation/CliMAF/climaf_installs/climaf_V2.0.0/bin/exiv2
---
CliMAF version = 2.0.0
CliMAF install => /ciclad-home/jservon/Evaluation/CliMAF/climaf_installs/climaf_V2.0.0
Cache directory set to : /data/jypmce/climafcache (use $CLIMAF_CACHE if set)
Cache directory for remote data set to : /data/jypmce/climafcache/remote_data (use $CLIMAF_REMOTE_CACHE if set)
Available macros read from ~/.climaf.macros are : []

>>> len(clist())
Content of CliMAF cache
0
>>> len(clist())
Content of CliMAF cache
0
>>> csync()
>>> len(clist())
Content of CliMAF cache
0
>>>

There is still some content in my cache directory

[jypmce@ciclad-ng latest]$ du -sh /data/jypmce/climafcache
1.8G    /data/jypmce/climafcache

[jypmce@ciclad-ng latest]$ find /data/jypmce/climafcache -type f | wc -l
961
jypeter commented 3 years ago

Now if I run the same script again, it runs much faster because it obviously uses cached results (success!), but clist returns an empty list, which seems strange, because the cache has been successfully used... I think that maybe this is the time to use csync for updating both:

>>> clist()
Content of CliMAF cache
[]
>>> csync()
>>> clist()
Content of CliMAF cache
[]
>>> csync(update=True)
>>> len(clist())
Content of CliMAF cache
1
>>> clist()
Content of CliMAF cache
['failed']

The content of the cache on disk seems to remain unchanged

[jypmce@ciclad-ng latest]$ du -sh /data/jypmce/climafcache
1.8G    /data/jypmce/climafcache
[jypmce@ciclad-ng latest]$ find /data/jypmce/climafcache -type f | wc -l
961
rigoudyg commented 2 years ago

Sorry for the late reply @jypeter. The commit 19207f4 should solve your problem. It is available in branch V3.0 which prepare the next version. if you have the tools to check that, you can test it and tell me if there is something wrong.