Re: Plans for Future Maintenance of Gym #2259

FirefoxMetzger commented 3 years ago

👋 I don't have the power to directly comment on the issue referenced in the title #2259 , so I thought I leave my comments in a new issue.

Someone to maintain Lycon2. Lycon is a Python library that's just took the C image resizing logic from OpenCV and put it in it's own repo. This makes it run slightly faster, and more importantly it gets rid of all the horrifying installation issues associated with OpenCV (and most RL libraries only depended on OpenCV for this functionality).

I was wondering if scikit-image could do the job here. It flies a bit under the radar but is actually quite feature-rich. It should handle all the transformation needs (transform module docs) and is pretty straight-forward to use (rescaling example). It is already well maintained and will probably continue to be maintained for a long time; also - given that it is one of the big scikits - availability across platforms is quite good.

Then again, I don't know how deeply nested the dependency on lycon2 is in this repo, so I don't know how big a task it would be to switch dependencies or if it is feasible at all.

Dealing with all flavors of MuJuCo problems (I am objectively not qualified for this)

Not quite mujuco, but there is some related and pretty exciting stuff happening at Open Robotics that is good to be aware of. Their next version of Gazebo called Ignition Gazebo is starting to get usable. The main difference to existing simulators (like mujuco or gazebo classic) is that the simulator is built as a library that is meant to be included by upstream projects. This way the simulator runs inside the same process which - among other things - means the simulator can wait for the agent to finish computing and vice verse (i.e., less synchronization issues) and means that the gym env and the simulator can share data zero-copy (i.e., rollout speed advantages). It's also the only simulator I'm aware of that supports multiple physics engines... It is Apache-2.0 licensed and fully open-source, which is never a bad thing. However, it doesn't have Windows support at present [directX rendering problems]; not sure about macOS.

There is a python project that aims to provide integration between gym and ignition called gym-ignition; thanks to @diegoferigo 🎉. Perhaps this could become an addition to/replacement of MuJuCo that can be easier to maintain?

Switching to imageio-ffmpeg, per discussion in Fixes video recorder ffmpeg on Centos7 and RHEL7 #1893

💯 Thank you for the support! (imageio maintainer here) Feel free to give me a shout if you run into issues with this.

jkterry1 commented 3 years ago

Hey, thanks

-Would you be willing to do the imageio PR? -scikit image is dramatically slower, we looked into this in detail

tristandeleu commented 3 years ago

Was this issue resolved?

-scikit image is dramatically slower, we looked into this in detail

Would you mind sharing some benchmarks to back this up?

Then again, I don't know how deeply nested the dependency on lycon2 is in this repo, so I don't know how big a task it would be to switch dependencies or if it is feasible at all.

There is no dependency on Lycon2 yet, so any suggestion is probably welcome!

FirefoxMetzger commented 3 years ago

Would you be willing to do the imageio PR?

Theoretically yes; practically I don't think I'll get around to it any time before the Christmas break and by that time you likely want to be done with this already (I assume). My priority, for now, is on getting scikit-bot off the ground.

That said; I'm happy to review if desired.

Was this issue resolved?

It's fine with me either way. There wasn't a specific issue; more of a comment (since the original topic is locked), so I'm fine with this issue being closed in order to stay organized.

Would you mind sharing some benchmarks to back this up?

scikit-image does indeed have the stigma of being slower, although this doesn't apply to all algorithms. I would actually be quite interested to see benchmarks for the transform module and resizing in particular. I'd expect this module to be on par with OpenCV, which is why I suggested it since I am under the impression that this is the main (only?) need here. If not, a upstream PR is also an option :)

Generally, scikit-image tends to be slower if the algorithm is written in pure python or for some heavy algorithms. A partial reason for the latter was because internal computation was often done in double precision only. @grlee77 has done some nice work on getting processing in other precisions going (in particular float which doubles the number of registers for AVX), so things are improving on that front 💪 .

Some older benchmarks I'm aware of (though all are related to convolutions):

jkterry1 commented 3 years ago

@benblack769 did the benchmarking for the specific methods at issue here in the past

benblack769 commented 3 years ago

There are some old benchmarks of opencv/lycon vs scikit here: https://github.com/ethereon/lycon#benchmarks (I did not do these benchmarks, but I did go and check to make sure that scikit was slower, and while I didn't keep track of my metrics, I recall it being around 10x slower for area interpolation, and 3x slower for linear interpolation).

If you wish to understand why opencv/lycon is so much faster, just look at this crazy source code: https://github.com/ethereon/lycon/blob/master/src/lycon/transform/resize/area.h#L257 What you are looking at is archetecture specific intrinsic code. You cannot beat this with standard compliant c.

We decided to fork lycon and maintain it ourselves as lycon2 because the original lycon maintainer is unreachable. In theory, we can change the lycon API to make it compatible with opencv, which should make it much easier to swap out.

FirefoxMetzger commented 3 years ago

If you wish to understand why opencv/lycon is so much faster, just look at this crazy source code:

Kudos to lycon for using SSE2 kernels! It always makes me happy to see people using SIMD ❤️ thanks for sharing @benblack769 .

Since you guys are set on lycon, I hope that you can find a maintainer for it soon :) . People with SIMD expertise aren't all too common in this corner of the internet (in my experience) and it's such a shame that lycon doesn't have AVX kernels given that AVX support is common and given that performance seems to be a big concern for gym.

On a tangent (since I really like the topic): you can get auto-vectorization to generate similar results if you are mindful about the structure of your inner loop; this even works in cython (example: https://github.com/scikit-image/scikit-image/blob/d1ceba3aee543ad014a38a51ce4481f419c07f22/skimage/restoration/_rolling_ball_cy.pyx#L220-L237). Though optimizing for auto-vectorization leads to somewhat weird code and can easily break making it unpleasant to maintain.

benblack769 commented 3 years ago

So the lycon developer simply took the code from opencv. So this vectorization code was written by the opencv developers, they are the ones who deserve the cudos.

I have explored autovectorization extensively, and am familiar with its benefits, however, it often has extremely fragile performance. And some of this would be very hard to write in a autovectorization friendly manner for example, the packing and unpacking of uint8 into uint16: https://github.com/ethereon/lycon/blob/master/src/lycon/transform/resize/area.h#L252

benblack769 commented 3 years ago

A modern version of opencv does support avx2: https://github.com/opencv/opencv/blob/master/modules/imgproc/src/resize.cpp. However, I think standard prebuilt wheels still are conservative and don't compile with it.

openai / gym

Re: Plans for Future Maintenance of Gym #2259 #2341