Open wangshuxihe00 opened 4 months ago
There are also some problems, such as the fact that the insert function in std supports different input parameters, while the stdgpu has only one parameter. There are also such problems in other places, are these specially made because of the features of the device side?
While inserting at the the end of the container is straightforward, inserting from an arbitrary position would involve temporary memory allocations. Once could of course first shift the entries located after the insertion position to their new positions, but doing this in parallel only works reliably if the respective memory regions do not overlap (when end() - insert_position <= inserted_elements.size()
).
Supporting more versions of insert
also depends on deciding on which side this function is most useful, i.e. whether it should be called only from host/CPU or from device/GPU. The version that inserts a single value would be better on the device-side, yet supporting inserting at arbitrary positions might be expensive as all subsequent values need to be shifted in a thread-safe way.
So, these (documented) limitations are present since implementing these features on the device side efficiently is not trivial. Nevertheless, the situation is far from ideal and any suggestions for improvements are very welcome.
I see. Thanks for the answer!
The insert function here can only be inserted from the last position, which is obviously different from the intention of insert, which should be inserted from any position.