mbr / simplekv

A simple key-value store for binary data.
http://simplekv.readthedocs.io
MIT License
152 stars 50 forks source link

Ignore directory entries in FilesystemStore #32

Closed fmarczin closed 7 years ago

fmarczin commented 9 years ago

Currently, if the root of a FilesystemStore contains directories, the .keys()-method also returns those entries, which lead to errors in .put()/.get() etc.

This patch removes directory entries from .keys() and ._has_key()

mbr commented 9 years ago

What on earth are you putting inside your Filesystem Stores? =) I'll have to check this out and maybe write a few tests first. Thanks for pointing it out.

fmarczin commented 8 years ago

I added a test for the case that is handled differently (directory entries in storage). The tests pass on py27, the failures on py33 seem unrelated to this PR.

mbr commented 8 years ago

Thanks for the patch. I can merge this, but there's one thing I'm still unsure about: Is there a valid use case for not giving a storage exclusive access to a directory?

fmarczin commented 8 years ago

This change is more intended as something to make simplekv more robust and consistent when it encounters weird/invalid data. I can't come up with a proper use-case where this would happen on purpose 😄

mbr commented 8 years ago

The issue is that is comes at a significant cost -- a syscall for every file inside the directory when listing keys.

Assuming you have a directory with 500,000 keys, os.listdir can still be fast, but calling stat on each of them creates some unnecessary overhead.

On Wed, Oct 19, 2016 at 11:15 AM, Felix Marczinowski < notifications@github.com> wrote:

This change is more intended as something to make simplekv more robust and consistent when it encounters weird/invalid data. I can't come up with a proper use-case where this would happen on purpose 😄

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mbr/simplekv/pull/32#issuecomment-254758679, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGv8TXQdmwKWUA49XoKT7d5ysQR3v4Qks5q1d-YgaJpZM4F_ElE .

fmarczin commented 8 years ago

Ah, that is true of course. I hadn't considered this before.

I did some research, and sadly, the directory bit is actually available in the POSIX calls that happen during os.listdir(), but they aren't used. ("d_type: This field contains a value indicating the file type, making it possible to avoid the expense of calling lstat(2) if further actions depend on the type of the file.": http://man7.org/linux/man-pages/man3/readdir.3.html) https://github.com/python/cpython/blob/f25c93de2cdef8d96a91fafbdcf77b9bb8d8492a/Modules/posixmodule.c#L3539

So, yeah. This adds a call to stat/lstat for each entry. I can completely understand why you wouldn't want to merge this. Close this if you want.