piskvorky / sqlitedict

Persistent dict, backed by sqlite3 and pickle, multithread-safe.
Apache License 2.0
1.16k stars 130 forks source link

Introduce weak references #165

Closed mpenkov closed 1 year ago

mpenkov commented 1 year ago

Author is @Cologler

Continued from #153 because that branch was mistakenly deleted.

mpenkov commented 1 year ago

@piskvorky @Cologler Please eyeball this PR and let me know if you spot anything odd

Cologler commented 1 year ago

@piskvorky @Cologler Please eyeball this PR and let me know if you spot anything odd

Looks Good To Me đź‘Ś.

Cologler commented 1 year ago

What if the user does some longtime work in the loop?

for value in a_sqlite_dict.values():
    # longtime work here, like an HTTP request
    ...
mpenkov commented 1 year ago

What if the user does some longtime work in the loop?

for value in a_sqlite_dict.values():
    # longtime work here, like an HTTP request
    ...

I think the response queue will continue to receive new items.

Basically, there is a producer/consumer problem here. The producer is the sqlite thread and the consumer is the user. We want to produce results quickly enough for the user to consume them without overwhelming the user. If we produce too quickly (or the user is too slow to consume) then the queue grows in size and occupies more memory.

I think one way to solve that particular problem is to use fixed-size queues for the responses. If a queue is full, then it means the user is consuming faster than we are producing, and we can produce slower.

@piskvorky What do you think? Is there a reason why we use queues of unlimited size?

piskvorky commented 1 year ago

If a queue is full, then it means the user is consuming faster than we are producing, and we can produce slower.

Do you mean "consuming slower"?

Anyway, limiting the queue is technically easy, here:

https://github.com/RaRe-Technologies/sqlitedict/blob/8545a174a4a32bd2f8eb7c17d89899370138fe3a/sqlitedict.py#L623-L626

But I'm not sure about the implications. The code is sufficiently complex that a blocking puts (due to "queue full") could lead to dead locks, unless we think through all the code paths carefully.

What happens when the consumers don't get (for whatever reason – generator, exception…), so that the put blocks forever?

mpenkov commented 1 year ago

Do you mean "consuming slower"?

Yes, sorry.

Anyway, limiting the queue is technically easy, here:

Yes, as you point out, this is the easy part. The hard part is ensuring nothing else breaks.

What happens when the consumers don't get (for whatever reason – generator, exception…), so that the put blocks forever?

Good question...

Anyway, I think this discussion is out of scope for this PR (but definitely helpful). I think we're done for now.