Closed bwalkowi closed 5 years ago
Yes, you understand correctly. We can clear state->tdata
and populate it again.
Attaching data to every thread as the thread-specific data, maybe it's a better way to solve the problem. However, I failed to find a proper way to get and set thread-specific data.
There is another way. The ep_tdata_t
is not thread specific. We can pop or create tdata
from a synchronized pool. Push it back after encoding or decoding for reuse.
Synchronized pool sounds good. I thought ep_tdata_t
was thread specific since ep_lock_t
that points to it also keeps tid
and one thread always uses ep_lock_t
it previously marked. But if it is not then synchronized pool should fix the problem. It will create new tdata
only if there are more than logical_processors
threads trying to encode/decode msg at the same time.
On the other hand, if it is not thread specific and can be reused then cleaning it should be as simple as setting state->lock_used
to 0 (or 1 and immediately taking first slot for current thread)? With this no new memory would need to be allocated. Also I assume it should be rather rare so populating it again shouldn't be expensive.
What do you think?
Have you changed +S Schedulers:SchedulerOnline
options? My cpu has 8 logic cores, when I use +S16:16
options the error will occur. Changing erlang:system_info(logical_processors)
to erlang:system_info(schedulers)
will fix this.
We could not simply clear state->lock_used
and take the first ep_tdata_t
for reuse, because it might be used by other threads. We should lock it at least.
erlang:system_info(schedulers)
is better limit than erlang:system_info(logical_processors)
. Then again, I wonder if situations where threads dies without taking down whole system and are respawned with new tid
are possible. Even if number of threads is constant tid
can change. Maybe beside changing logical_processors
to schedulers
we should also use synchronized pool/stack as state->locks
(e.g. linked list). Then each thread can pop one slot from it and push it back ones it finishes encoding/decoding. Since there shouldn't be more threads than schedulers
we can allocate all slots at load time.
How about this?
Yes, it's a robust method. But if we use a pool, we should lock it twice when every encoding or decoding. Your previous method that clearing state->lock_used
is better, by carefully comparison.
Because when state->lock_used == state->lock_n
, we don't need any lock. If any tid
is changed, then no tdata
will be resolved, we just wait for the state->cache_lock
then clear state->lock_used
.
Sure, if you're fine with this I will try to implement it and create a pr after testing.
Hi,
sometimes I get
tid_not_found
during either data encoding or decoding. After eliminating the option that messages were invalid I take a look atenif_protobuf
code. If I understand it correctly some space is allocated in state (exactlyerlang:system_info(logical_processors)
slots) so that each thread can keep it's data there. When new thread comes it simply takes one of the empty slots. If no slot is available then{error, tid_not_found}
is returned. Is it possible that there can be more threads thanlogical_processors
? or that threads die and are respawned with dirrefenttid
? Either way, would it be possible to clearstate->tdata
so it can be populated again instead of returning error? or add function to do so (e.g.purge_tdata
) from erlang?