Open mereacre opened 5 years ago
My ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127357
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127357
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'm loading the ndarrays as memmap data, which have a constantly open file descriptor until the ndarray goes out of scope. If you try to write to the ndarray though, it will automatically make a in-memory copy of the ndarray, and write it there, to avoid change the data on disk.
In this case, the array isn't assigned to a variable or anything, so there is no issue:
$ pipenv run python3 -c 'import numpy as np; import tempfile as tmp; [x if np.memmap(tmp.TemporaryFile(), shape=(1,1)) else x for x in range(10000)]'
However, in this list comprehension, the memmaps are stored, so this causes an error:
$ pipenv run python3 -c 'import numpy as np; import tempfile as tmp; [np.memmap(tmp.TemporaryFile(), shape=(1,1)) for x in range(10000)]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in <listcomp>
File "/home/nqminds/.local/share/virtualenvs/Documents-8nqfD-wS/lib/python3.6/site-packages/numpy/core/memmap.py", line 264, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files
I would like to add three things.
1000
.2
)On point 2 we should think about streams as I'm planing to do in the nodejs version.
Too many open files error
Error value
Example code
Example database
/tmp/nqm-dream-store/1549363951308608.sqlite
Schema:
{"dataSchema": {"timestamp": {"__tdxType": ["number", "Integer"]}, "data": {"__tdxType": ["ndarray"]}}, "uniqueIndex": [{"asc": "timestamp"}]}
Rows:
9459
Example row:
15493643012880000|{"t": "H", "s": [288, 382], "v": "f", "p": "/tmp/nqm-dream-store/1549363951308608.d/AAABaL1PDeo=b8rx8i7t.dat"}
Ndarray file size:
288*382*2 (bytes)
I think we need to close the files after reading the data into memory. Or put a limit on how many files can be opened at the same time. So for instance
db.getData(filter={}, projection = {}, options = {})
will return only the first1000
rows. Then it is up to the user to retrieve the next1000
by using the properfilter
.