goretkin / Bullet.jl

10 stars 1 forks source link

Why is this lot slower than pybullet? #3

Closed sash-a closed 3 years ago

sash-a commented 3 years ago

I have two hopefully identical scripts in python and julia using pybullet and bullet.jl and the pybullet one is much faster. I am new to julia, so hopefully I am doing something wrong, but if not could you possibly explain why python is much faster?

Julia code:

module test

# include("./Bullet.jl/deps/build.jl")
include("./Bullet.jl/src/Bullet.jl")

function testing()
    sm = Bullet.connect(kind=:direct)
    Bullet.set_gravity(sm, [0,0,-10])
    Bullet.load_urdf(sm, "./Bullet.jl/deps/usr/data/planeMesh.urdf")
    Bullet.load_urdf(sm, "./Bullet.jl/deps/usr/data/cube_larger.urdf", position=[0,0,1000])

    for i in 1:100000
        Bullet.step_simulation(sm)
    end
    Bullet.disconnect()
end

end

Python code:

def test():
    import pybullet as p
    sm = p.connect(p.DIRECT)
    p.setGravity(0,0,-10)
    plane_id = p.loadURDF("./Bullet.jl/deps/usr/data/planeMesh.urdf")
    cube_id = p.loadURDF("./Bullet.jl/deps/usr/data/cube_larger.urdf", [0,0,1000])

    for i in range(1,100000):
        p.stepSimulation()

    p.disconnect()

test()

Julia is timed using:

(@v1.6) pkg> activate .
  Activating environment at `~/Documents/rl/bullet_envs.jl/Project.toml`

julia> include("./test.jl")
Main.test

julia> using BenchmarkTools

julia> @btime test.testing()
  572.122 ms (27 allocations: 1.08 KiB)

Python is timed using:

python -m timeit -s "from test import test; test"
pybullet build time: May 18 2021 18:01:29
50000000 loops, best of 5: 4.56 nsec per loop

Just visually when using GUI mode the python sim goes much faster and I'd imagine it's the same for direct. There must be a reason for this, but I'm not sure what? Hopefully it's me messing up something simple within julia.

Also I've fixed the issue in https://github.com/goretkin/Bullet.jl/issues/2, I you're accepting pull requests I can push it :D

sash-a commented 3 years ago

Upon closer inspection it seems like a lot of time is taken up inside of step_simulation. However, using only Raw.b3InitStepSimulationCommand(sm) does not completely solve the issue. It has taken the time from ~500ms to ~1ms, a large speedup, but nothing close to the ~4ns of pybullet, so any advice or insight would be much appreciated.

goretkin commented 3 years ago

Thanks for giving this a try. I should warn you that this package is not well tested and I don't plan on developing it in the near future. I hope it can still be useful to you in its current state.

~4ns is a suspiciously short amount of time. It makes me immediately wonder if anything is really being timed at all. And indeed, your code is not timing anything really. To fix it you can do e.g.

python -m timeit -s "from test import test; test()"

(note the functiontest gets called)

In your case, the function test gets called only once: the first time the module test is imported. So really what you're timing is how quickly python import statement runs when the module has already been imported.

goretkin commented 3 years ago

Please do report back with what new timings you get!

sash-a commented 3 years ago

Oh wow, what a silly mistake lol.

So I think there must be something I'm misunderstanding about timeit, because python -m timeit -s "from test import test; test()" outputs: 100000000 loops, best of 5: 3.37 nsec per loop (which is even faster than before??). If I had to guess it has something to do with the semi-colon syntax and not actually running the function.

So I decided to run it in a notebook with %timeit and I get 591 ms ± 4.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each). Which is pretty much what you would expect given the julia timings.

One question on the implementation, if you can remember. Calling Raw.b3SubmitClientCommandAndWaitStatus(sm, Raw.b3InitStepSimulationCommand(sm)) is 500x slower than calling Raw.b3InitStepSimulationCommand(sm). After some investigation it seems like it is necessary, so do you think that the cxx version would be significantly faster?

Anyways thanks for pointing out my mistake! Also if you could have a look at my pull request I think it would be useful for future users and I'd be happy to add a how to run section to the readme also, as that took me some time to work out.