complete dummy in lock-free - but this looks helpful?

den-run-ai commented 8 years ago

http://www.liblfds.org/

portable, license-free, lock-free data structure library written in C89

den-run-ai commented 8 years ago

and here is bounty for anyone who succesfully merges anything lock-free:

https://www.bountysource.com/issues/34861756

den-run-ai commented 8 years ago

this looks even more promising set of lock-free data structures (not sure if C89):

https://github.com/concurrencykit/ck

larryhastings commented 8 years ago

liblfds:

First reaction: Windows is a pretty important platform for us, even Win32. But if it's a great library maybe we can contribute to it if it needs help. (They aren't sure about their Win32 support.)

Second reaction: we can't use a linked-list implementation for lists, and if "Hash (add-only)" means what I think it means we can't use that either. And the other data structures are of secondary importance. Maybe they'll be interesting later down the road, but not right now.

concurrencykit: they seem to enforce requirements that are incompatible with us. e.g. on their hash table, they key is a region of memory up to 64k long. it sounds like they own it.

Maybe we can learn things by reading their source code / algorithms / approaches, but I don't think we can use either of these libraries.

Since I want to try and keep the issues area clean, I'm closing this. I think it's fine to bring these things up, but maybe we can use the wiki for that?

den-run-ai commented 8 years ago

Wiki sounds good! Thanks for quick analysis, which makes sense to me.

On Saturday, June 4, 2016, larryhastings notifications@github.com wrote:

liblfds:

First reaction: Windows is a pretty important platform for us, even Win32. But if it's a great library maybe we can contribute to it if it needs help. (They aren't sure about their Win32 support.)

Second reaction: we can't use a linked-list implementation for lists, and if "Hash (add-only)" means what I think it means we can't use that either. And the other data structures are of secondary importance. Maybe they'll be interesting later down the road, but not right now.

concurrencykit: they seem to enforce requirements that are incompatible with us. e.g. on their hash table, they key is a region of memory up to 64k long. it sounds like they own it.

Maybe we can learn things by reading their source code / algorithms / approaches, but I don't think we can use either of these libraries.

Since I want to try and keep the issues area clean, I'm closing this. I think it's fine to bring these things up, but maybe we can use the wiki for that?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/larryhastings/gilectomy/issues/18#issuecomment-223771768, or mute the thread https://github.com/notifications/unsubscribe/AHgZ5cwrf9SJ_alTvWXv8dyfR0vBkX5jks5qIckagaJpZM4IuHeh .

JustAMan commented 8 years ago

@larryhastings Regarding "hash (add-only)" stuff - is your concern related to the "cannot shrink in size" part? If so, from what's written in dictobject.c I conclude that current Python dictionaries cannot shrink as well, so this should not be a problem.

I'm more worried about the needs to initialize structs on each CPU core separately.

All in all, this does not sound worth integrating to me, either.

den-run-ai commented 8 years ago

liblfds commented 8 years ago

Hi.

I am the author of liblfds.

I'm puzzled by a sentence in one of the earlier comments - "we can't use a linked-list implementation for lists". I don't understand - on the face of it, that seems to make no sense? "we can't use a linked list for lists"?

Regarding Win32, the platform will be supported on an ongoing basis, in part because of existing users, in part because of future possible users, and in part because of the very high value of the platform in providing a very different platform upon which to compile (and so to find bugs and mistakes). What is in question however is the provision of Visual Studio solution files. They are problematic in terms of workload and so challenging to support.

Finally, regarding a concern from JustAMan (clearly a Faith No More fan) - "I'm more worried about the needs to initialize structs on each CPU core separately". This is as far as I know absolutely and unequivocably unavoidable. Each thread MUST issue a load memory barrier so that it is guaranteed to see the initialization work performed by the initializing thread on the data structure state. The same work is being done in fact when normal locks are used (for it is equally as necessary) - for each lock, when it is taken, issues a load barrier.

However, as with all things, there may be steps that can be taken to address or ameilorate the problem.

In general, there is a very strong need for a O(log n) or O(1) data structure in the library which supports full add and delete - there are so many uses cases which require this and cannot be met without it.

JustAMan commented 8 years ago

@liblfds

What Python names a "list" is actually an "array" as it occupies a continuous chunk of memory and offers O(1) access time to any member by index. So we cannot use a linked list for Python list :)

Thanks for explaining that macro, I was misunderstanding it. It makes full sense now. Maybe adding that explanation to the doc about the macro could be beneficial for further readers that this macro is not about initializing any particular struct but actually about issuing memory barriers to make sure everybody gets initialized structs to work with (all the structs... I was thinking I'd have to call this macro for each struct I want to work with; if that is a full fence then it's probably enough to call it once and work with as much structs as I want, right?).

As for Win32 support - are we talking about the same thing? I think @larryhastings was worried about 32-bit version, as your site states that 64-bit version is tested while 32-bit "should work". And I don't think that solutions for every possible combination are any problem - we can (I think) integrate the call to make to the solution for Python as a pre-build command or something. Also could it be made possible to build liblfds on Windows with nmake instead of gnumake so that having Visual Studio installed would be enough? nmake is native to VS as far as I know.

P.S. My nickname was invited by myself, and first time I heard about "Faith No More" was from you today :) So it's not related to that.

larryhastings commented 8 years ago

Howdy liblfds. I appreciate your willingness to help. But AFAICT the data structures provided by your library aren't applicable to CPython + gilectomy right now. JustAMan's summary about the list is correct. Python's term for the data structure is "list", so naturally that's what I call it, but it really is an array.

liblfds commented 8 years ago

JustAMan wrote:

What Python names a "list" is actually an "array" as it occupies a continuous chunk of memory and offers O(1) access time to any member by index. So we cannot use a linked list for Python list :)

Ah, Python lists - all is clear.

But, as I recall, Python lists can grow. Is the list reallocated each time, or is there a higher level data structure holding each of the allocations which form the list?

Thanks for explaining that macro, I was misunderstanding it. It makes full sense now. Maybe adding that explanation to the doc about the macro could be beneficial for further readers that this macro is not about initializing any particular struct but actually about issuing memory barriers to make sure everybody gets initialized structs to work with (all the structs... I was thinking I'd have to call this macro for each struct I want to work with; if that is a full fence then it's probably enough to call it once and work with as much structs as I want, right?).

Right - exactly.

One of the problems of being too close to your own code for too long - always in documentation you try to explain things to the outsider, but you can progressively lose the ability to be in the mind that outsider.

As for Win32 support - are we talking about the same thing? I think @larryhastings was worried about 32-bit version, as your site states that 64-bit version is tested while 32-bit "should work".

Ah - well spotted. I was indeed thinking Larry was concerned about Windows support, as opposed to 32 bit support.

For all platforms, including Windows, both 32 and 64 bit platforms will be supported on an ongoing basis - the problem however is that I currently lack a 32 bit Windows platform to compile upon :-/ I have a Windows XP installer disk, but it's in storage, in a distant country!

And I don't think that solutions for every possible combination are any problem - we can (I think) integrate the call to make to the solution for Python as a pre-build command or something.

Good. Solution files for me are just... OMG death. Something like ten thousand settings, each of which has to be set by hand using a mouse and a GUI.

Also could it be made possible to build liblfds on Windows with nmake instead of gnumake so that having Visual Studio installed would be enough? nmake is native to VS as far as I know.

nmake is I also believe natively provided.

I'll have a look.

P.S. My nickname was invited by myself, and first time I heard about "Faith No More" was from you today :) So it's not related to that.

grin

https://www.youtube.com/watch?v=7g4L47kEcS0

Comment - "this song is on my funeral playlist" =-)

liblfds commented 8 years ago

Howdy liblfds.

Howdy, Larry.

I appreciate your willingness to help. But AFAICT the data structures provided by your library aren't applicable to CPython+ gilectomy right now.

Library development is ongoing.

What is it that you would find useful?

larryhastings commented 8 years ago

Python lists are effectively realloc'd when they run out of space. Same with hash tables.

I don't know of anything specifically that we need WRT concurrent-friendly data structures. So far we're managing with fine-grained locking. We have much bigger performance worries than the data structures right now.

liblfds commented 8 years ago

Python lists are effectively realloc'd when they run out of space.

What happens if the user attempts to insert an element between two existing elements? the right hand-side of the array is copied up one element, to make room?

I always had the impression Python lists were really something much more under the hood - a tree or a hash - and the list-like interface was something presented to the user.

Have you come across counted trees? (maybe you already use them?) the key for an element is its position in the tree, and inserting a new element automatically increases the position of following elements without having to visit them.

So you would imagine a tree with say 100 elements in, 0 to 99, and then the user inserts a new element at location 50. It's O(log n) to insert (the tree is balanced), and that gives you elements now 0 to 100, exactly as you want them, with the new element at 50 and all the elements after automatically bumped up by one (i.e. at zero cost).

Same with hash tables.

The hash tables are array backed, rather than backed by a dynamic data structure, such that they can run out of space?

I don't know of anything specifically that we need WRT concurrent- friendly data structures. So far we're managing with fine-grained locking. We have much bigger performance worries than the data structures right now.

It'd be hard to write something which would make me more curious :-) what kind of problems are you facing?

JustAMan commented 8 years ago

What happens if the user attempts to insert an element between two existing elements? the right hand-side of the array is copied up one element, to make room?

Indeed. Python list is realloc'ed if it doesn't have enough room (they try to reallocate it in, I think, exponential way - like make it twice bigger each time or something like that). So if you try to insert something in the middle it's about copying right part one place righter and then placing an entry to the vacant space. The fact that Python lists actually hold pointers to objects (not objects themselves) helps a bit as there's not much information to copy over, but still.

The hash tables are array backed, rather than backed by a dynamic data structure, such that they can run out of space?

Yes and no. They're backed with an array, but when this array gets around 2/3 filled they're realloc'd to get more room (and stuff is copied over to make access faster), so in a way they cannot run out of space. Here's a link with good enough explanation how Python hash tables (dictionaries, in Python language) work: http://www.laurentluce.com/posts/python-dictionary-implementation/

liblfds commented 8 years ago

Thankyou for the link. A considered design, chosen to maximize cache locality and optimized for consequecutive access cases. The realloc is a major roadbump when it happens, but given the array only increases in size, the initial realloc and copies should usually occur early and so the system will come to a stable state and then perform. The downside is resource usage will always be at the high-water mark, until the hash is destroyed.

In a multi-threaded scenario, how is the hash protected? Larry mentioned fine-grained locking? I'm not an expert in locking-based solutions, but in the straightforward solutions (one lock per data structure) the scaling factor is negative, i.e. you go fastest with a single core, no matter how many cores you have (although here I think of data structure accesses only - i.e. n threads simply hammering the data structures; in practise, in real code, where the threads usually have a bunch of single-threaded work to do but then need a data structure access or two, actual overall system throughput rises as the core count rises, until the shared data structure accesses become so slow, from contention, that their cost is greater than the single-threaded work being performed).

liblfds commented 8 years ago

I've been reading this;

https://lwn.net/SubscriberLink/689548/4328423f85a47679/

I'm not really famiilar with conventional garbage collection, or how it is used in practise, so I am most likely completely wrong, but I have an implementation (not yet published - next release probably, I have debugging and testing to finish) of a type of lock-free garbage collection, and although it's quite limited, quite specialized, there may be a very small chance it might be of some relevance.

Its advantage is that there is no need to stop-the-world, which I believe is usually what has to happen with garbage collection.

JustAMan commented 8 years ago

The problem with gc in Python is:

default implementation is reference counting plus gathering reference cycles at each n-th allocation/deallocation
one cannot easily change implementation from "reference counting" to something else - a lot of C extensions would break

In a multi-threaded scenario, how is the hash protected?

In default Python implementation everything is protected by GIL (one global lock per process). In this gilectomy thing dictionaries are so far protected with a per-structure recursive lock, and there's some specifics about locking when one considers R/W locks. I've tried to briefly outline those at #30, I hope I didn't miss anything important there.

larryhastings commented 8 years ago

When you insert in the middle, the elements after are memcpy'd up one slot to make room. They're grown by 1.5x.

Dicts are stored in a hash table, which is an array. It used to be simpler, but now the keys can be (always are?) stored separately for the "key-sharing dicts" feature merged in 3.2.

In the future it would be more straightforward if you'd answer your own questions about Python's builtin types. They're all implemented in C in this directory:

https://hg.python.org/cpython/file/tip/Objects

Dicts (hash tables) are in dictobject, lists (arrays) are in listobject.

larryhastings / gilectomy

complete dummy in lock-free - but this looks helpful? #18