DCACHE synthesis and implementation?

Hello, I have few questions regarding cache implementation of vortex.

Can someone help me understand how we can configure #cache-ports to synthesize on Xilinx Alveo?
From VX_gpu_pkg.sv, VX_config.vh and VX_socket.sv files, I can see that NUM_REQS for DCACHE is always 1 for #threads <= 16. Why is that? Aren't #core reqs for DCACHE should be equal to #threads?
I tried synthesizing vortex on Alveo with --cores=1, --warps=4 and --threads=32 but it gives Routing results verification failed due to partially-conflicted nets (Up to first 10 of violated nets): error. I am assuming it is because NUM_REQS has increased from 1 to 2. Has someone successfully synthesized with these settings?
(230,638,150) NODE_HLONG Hist: 2 Tile Name: INT_X41Y104 Node: EE12_BEG0 Overlapping Nets: 3

Nets: level0_i/ulp/vortex_afu_1/inst/afu_wrap/vortex_axi/vortex/genblk1[0].cluster/genblk2[0].socket/genblk4[0].core/execute/lsu_unit/genblk1[0].lsu_slice/mem_scheduler/req_queue/skid_buffer/stream_buffer/data_out_r_reg[2210]_0[567] level0_i/ulp/vortex_afu_1/inst/afu_wrap/vortex_axi/vortex/genblk1[0].cluster/genblk2[0].socket/genblk4[0].core/execute/lsu_unit/genblk1[0].lsu_slice/mem_scheduler/req_queue/skid_buffer/stream_buffer/data_out_r_reg[2210]_0[598] level0_i/ulp/vortex_afu_1/inst/afu_wrap/vortex_axi/vortex/genblk1[0].cluster/genblk2[0].socket/genblk4[0].core/execute/lsu_unit/genblk1[0].lsu_slice/mem_scheduler/req_queue/skid_buffer/stream_buffer/data_out_r_reg[2210]_0[330]

(253,638,150) NODE_HLONG Hist: 3 Tile Name: INT_X47Y104 Node: EE12_BEG0 Overlapping Nets: 3

(222,632,159) NODE_HLONG Hist: 2 Tile Name: INT_X39Y110 Node: EE12_BEG4 Overlapping Nets: 3 . . .

Basically, I want to configure different #cache-ports and observe if the cache is scalable or not.

vortexgpgpu / vortex

DCACHE synthesis and implementation? #192