mereacre commented 5 years ago

Too many open files error

Error value

File "/home/alexandru/tmp/flask_db_test.py", line 13, in hello
    first_row = db.getData(filter={}, projection = {}, options = {})
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/database.py", line 440, in getData
    schema, row, data_dir) for row in projected_data]
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/database.py", line 440, in <listcomp>
    schema, row, data_dir) for row in projected_data]
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/_sqliteschemaconverter.py", line 248, in convertRowToTdx
    for col, val in row.items() if col in schema}
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/_sqliteschemaconverter.py", line 248, in <dictcomp>
    for col, val in row.items() if col in schema}
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/_sqliteschemaconverter.py", line 283, in convertToTdx
    return converter[fixed_type](value) # type: ignore
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/_sqliteschemaconverter.py", line 280, in <lambda>
    data_dir),
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/ndarray/fileio.py", line 50, in getNDArray
    return storage_class.get(metadata,relative_loc)
  File "/home/alexandru/.local/lib/python3.6/site-packages/nqm/iotdatabase/ndarray/storageformats.py", line 96, in get
    order=order)
  File "/home/alexandru/.local/lib/python3.6/site-packages/numpy/core/memmap.py", line 264, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files

Example code

from nqm.iotdatabase.database import Database
from flask import Flask
app = Flask(__name__)

app.config["suppress_callback_exceptions"]=False

@app.route("/")
def hello():
    # Open the database
    db = Database("/tmp/nqm-dream-store/1549363951308608.sqlite", "file", "w+")

    # Get data
    first_row = db.getData(filter={}, projection = {}, options = {})
    return str(first_row)

Example database

/tmp/nqm-dream-store/1549363951308608.sqlite

Schema: {"dataSchema": {"timestamp": {"__tdxType": ["number", "Integer"]}, "data": {"__tdxType": ["ndarray"]}}, "uniqueIndex": [{"asc": "timestamp"}]}

Rows: 9459

Example row: 15493643012880000|{"t": "H", "s": [288, 382], "v": "f", "p": "/tmp/nqm-dream-store/1549363951308608.d/AAABaL1PDeo=b8rx8i7t.dat"}

Ndarray file size: 288*382*2 (bytes)

I think we need to close the files after reading the data into memory. Or put a limit on how many files can be opened at the same time. So for instance db.getData(filter={}, projection = {}, options = {}) will return only the first 1000 rows. Then it is up to the user to retrieve the next 1000 by using the proper filter.

mereacre commented 5 years ago

My ulimit -a


core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127357
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127357
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

aloisklink commented 5 years ago

I'm loading the ndarrays as memmap data, which have a constantly open file descriptor until the ndarray goes out of scope. If you try to write to the ndarray though, it will automatically make a in-memory copy of the ndarray, and write it there, to avoid change the data on disk.

In this case, the array isn't assigned to a variable or anything, so there is no issue:

$ pipenv run python3 -c 'import numpy as np; import tempfile as tmp; [x if np.memmap(tmp.TemporaryFile(), shape=(1,1)) else x for x in range(10000)]'

However, in this list comprehension, the memmaps are stored, so this causes an error:

$ pipenv run python3 -c 'import numpy as np; import tempfile as tmp; [np.memmap(tmp.TemporaryFile(), shape=(1,1)) for x in range(10000)]'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <listcomp>
  File "/home/nqminds/.local/share/virtualenvs/Documents-8nqfD-wS/lib/python3.6/site-packages/numpy/core/memmap.py", line 264, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files

I would like to add three things.

A default getData limit of 1000.
An option to load the arrays into RAM when getData() is called. This wouldn't be recommended since it will take lots of IO and tons of RAM.
A custom Error message that explains how to fix the file descriptor error (ie get less data, let your arrays go out of scope if you are no longer using them, use option 2)

mereacre commented 5 years ago

On point 2 we should think about streams as I'm planing to do in the nodejs version.

nqminds / nqm-iot-database-py

ERROR: Too many open files #19

Too many open files error

Error value

Example code

Example database