Open Rorsachach opened 2 years ago
a) that problem start with the NIC not distributing the load accross multiple CPU. It is generic VPP issue, you will need to ask the VPP community for help on that. b) the UPF function currently has race conditions that are highly likely to crash VPP if you run it on multiple worker threads. Don't do that!
a) that problem start with the NIC not distributing the load accross multiple CPU. It is generic VPP issue, you will need to ask the VPP community for help on that. b) the UPF function currently has race conditions that are highly likely to crash VPP if you run it on multiple worker threads. Don't do that!
Will the UPF crash with one main thread and one worker thread? I tried to configure the CPU this way, and it always crashed during PDU session delete. I found what looks like an error in the next code。
/* upf_pfcp.c */
void pfcp_free_session(upf_session_t *sx) {
......
sparse_free_rules(sx->teid_by_chid);
......
}
Then I looked at the vpp definition of sparse_vec_free. Add the following code and recompile it.
mspace_is_heap_object(
sparse_vec_header(sx->teid_by_chid),
clib_mem_get_per_cpu_heap()
);
It is found that the deletion operation cannot find the vec in the heap corresponding to the current CPU.
Is this due to the introduction of multi-threading? Is the problem of upf or vpp? I would like to get your reply. Thank you.
@sergeymatov it seems you where the one to last touch that piece of code, maybe you can comment on that?
To me it looks like the root problem must be somewhere else. sparse_vec is not a per CPU structure. It is IMHO more likely that something else has already free'd either the whole sx
structure or only the teid_by_chid
. In both case, the problem would be race condition between the management task and the work thread.
Sparse vector for TEID mapping should only be used (no matter it's read/write) in PFCP-related things. We currently running PFCP server on a main core while workers can not invoke modification of PFCP Session.
@Rorsachach you can try to add checkers if session or vector are actually exists before it's about to be freed and rise a clib_warning
message with some like
clib_warning ("Invoking sparse vec free, thread %d", vlib_get_thread_index ());
to check threads activity
@sergeymatov Thank you for your reply. I ran some more tests.
I first compared the teid_by_chid
generated by sparse_vec_new
with the teid_by_chid
passed in by sparse_vec_free
. They are same.
Then I checked with clib_mem_is_heap_object(sparse_vec_header (sx->teid_by_chid))
. The return value is sometimes true and sometimes false.
Then I run the upg with a single core and the same problem occurs
I compiled upg several times without changing any other parts of the code and found that sometimes it didn't crash and sometimes it did. So, the only thing I can be sure of is that sometimes the vector is not in the current cpu heap. But I don't know exactly what the problem is. I think the problem might be sparse_vec in vpp.
Any update on this issue.
The hardware I'm using: network device: X722 for 10GbE SPF+ 37d0 CPU: Xeon(R) D-2177NT @ 1.90GHz The driver Ihm using: vfio-pci
here is my startup.conf
When I ran the UPG and used 10Gbps of upstream and downstream traffic to measure the speed at the same time, the results were not ideal. Then I executed the
show run
and found that there was only one thread for processing the uplink data, while the thread for processing the downlink data varied depending on the traffic size and number of users. Could you tell me how to increase the speed of processing uplink data,please?I'm sorry I can't paste the specific command result, because there is no Internet connection.