kaveh808 / kons-9

Common Lisp 3D Graphics Project
MIT License
585 stars 33 forks source link

Optimize OpenGL Drawing #90

Open kaveh808 opened 2 years ago

kaveh808 commented 2 years ago

Use vertex arrays and the like to speed up the current naive drawing code in opengl.lisp.

JMC-design commented 2 years ago

it's been over 5 years since the last release of Opengl, there probably shouldn't be any reason to target anything but the latest. But since the writing is on the wall, perhaps some thought should be put into how to abstract over both opengl and vulkan. Though I'm thinking that might need somebody familiar with vulkan. If not, then there's choosing an abstraction to handle modern gl or write yet another one.

jolby commented 2 years ago

Regarding the need to keep our eyes on the next graphics API as @JMC-design was talking about: Piet-gpu is a good project to follow. They have a 2d/font focus, but they are pushing the envelope for doing as much of the compute for a UI on the GPU: https://github.com/linebender/piet-gpu project vision: https://github.com/linebender/piet-gpu/blob/main/doc/vision.md .. And Raph Levien has some fantastic articles about doing graphics/compute on modern GPU/gpu-apis: https://github.com/linebender/piet-gpu/blob/main/doc/blogs.md

JMC-design commented 2 years ago

Might also want to set a bar for minimum gpu memory, I guess that's something that needs to be tracked, such a weird concept. I've run across piet when looking for ideas on a rich text sort of api. I'm not sold on specifying ranges, though it is nice that it allows the text to be unmodified. I'm still leaning towards something I can read or write to a stream, so list of objects and lists that change attributes.

JMC-design commented 2 years ago

yet another opengl abstraction for lisp https://github.com/jl2/simple-gl

kaveh808 commented 2 years ago

I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages.

How we architect this (improved OpenGL interface, Vulkan, compute on GPU) is something we should discuss.

If we do have a Vulkan enthusiast, a first step could be to implement the equivalent of the code in opengl.lisp.

Also, one of my goals is to develop a cross-platform GUI toolkit. Currently we're building it on OpenGL, using the text engine by @awolven and font rasterizer by @JMC-design .

JMC-design commented 2 years ago

So I've just drawn my first triangle using vertex arrays and here are some of my initial thoughts. I'm assuming we'd like to fill buffers by just sending a list of points? What I've done for a test is just fill up a cl array, grab the vector-sap, and use that to fill buffers. With points we have to pack them. Do we pack into a cl array, pin and use, or just pack directly into a foreign array, and then free or keep the array around?
Does any packing we do into cl arrays have any effect on packing into simd packs?

Writing glsl in a string in a lisp buffer is a nightmare of formatting. In the long run it doesn't matter what a person uses to get a string for a shader program, but maybe there should be some default shader dsl, or formatting to make code and examples easier to read?

It'd seems like it might be nice to encapsulate these buffers into structs that can be passed around easily, then you have to build a bunch of functions to use those structs, and then years later you have cepl... or something similar. I wonder if anybody has made a comparison of the different layers on top of gl?

I'm not even sure if sbcl system pointers work the same way on windows or osx. So maybe packing directly into foreigns is required? And definitely so if any plans to support another implementation. If anybody is interested this is the code I used to test. https://plaster.tymoon.eu/view/3408#3408 , just replace the surface:update with whatever your window needs to swap buffers.

awolven commented 2 years ago

You could look at the text engine to see how vertex arrays are used there.

There seems to, however, at least on linux, been a change in the version of opengl used, rendering the text engine useless. I'm trying to fix it, but it would be nice to know if there is going to be a version change before I spend a lot of time targeting a specific version.

On Wed, Sep 7, 2022 at 8:27 AM Johannes Martinez Calzada < @.***> wrote:

So I've just drawn my first triangle using vertex arrays and here are some of my initial thoughts. I'm assuming we'd like to fill buffers by just sending a list of points? What I've done for a test is just fill up a cl array, grab the vector-sap, and use that to fill buffers. With points we have to pack them. Do we pack into a cl array, pin and use, or just pack directly into a foreign array, and then free or keep the array around? Does any packing we do into cl arrays have any effect on packing into simd packs?

Writing glsl in a string in a lisp buffer is a nightmare of formatting. In the long run it doesn't matter what a person uses to get a string for a shader program, but maybe there should be some default shader dsl, or formatting to make code and examples easier to read?

It'd seems like it might be nice to encapsulate these buffers into structs that can be passed around easily, then you have to build a bunch of functions to use those structs, and then years later you have cepl... or something similar. I wonder if anybody has made a comparison of the different layers on top of gl?

I'm not even sure if sbcl system pointers work the same way on windows or osx. So maybe packing directly into foreigns is required? And definitely so if any plans to support another implementation. If anybody is interested this is the code I used to test. https://plaster.tymoon.eu/view/3408#3408 , just replace the surface:update with whatever your window needs to swap buffers.

— Reply to this email directly, view it on GitHub https://github.com/kaveh808/kons-9/issues/90#issuecomment-1239389641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGMMITZTTX64K4Z4MENP3V5CJ3BANCNFSM6AAAAAAQFI53LM . You are receiving this because you were mentioned.Message ID: @.***>

JMC-design commented 2 years ago

I tried, but it reads like c and I don't see any lispy abstraction. The only thing I see is direct writing of individual bytes to foreign memory. I'm not bright enough to understand other languages.

kaveh808 commented 2 years ago

These are good questions, and there are a lot of moving parts on how we encode geometry: ease of editing in CL, optimized OpenGL display, for SIMD, for threads.

One possibility I have been mulling over is whether we should keep a low-level C representation which can act like an old school display list for our geometry classes. We would need to sync up the CL point arrays with these C-type vectors after modeling operations, which would be optimized for OpenGL and such.

Or we could have C-level structs for internal geometry, which we access and modify from GL. That might make CL editing a bit slower, but could result in faster rendering.

JMC-design commented 2 years ago

good eats https://www.youtube.com/watch?v=K70QbvzB6II

foretspaisibles commented 2 years ago

I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages.

Does it include distributed computing as a goal? :-)

kaveh808 commented 2 years ago

Down the road, why not? :)

ghost commented 2 years ago

Down the road, why not? :)

Because would be a 30MB SBCL runtime per node? I really wish there was something like MirageOS (which uses OCaml) for Common Lisp or Scheme.

ghost commented 2 years ago

good eats

I'm a bit full from their 130 page slide deck on optimization. Looks like OpenGL 4.2+ only, which caused a stomach rumble. Sometimes I wonder, "Why can't we just implement OpenGL in pure Common Lisp and be done with it?"

JMC-design commented 2 years ago

I think the approach is still interesting. Today I'm going to try and test if it makes any difference packing arrays from different types of points, into cl arrays that are pinned and sent, as well as foreign arrays and sent. In my brain it doesn't seem like there'd be much difference. Besides un/packing structured bits to be sent is on my todo list, calling it pipeline. For use with a new CLX and wayland. the thing with 4.2 is that 4.1 might have the same things just in extensions. Whether it's like that on Mac I don't know. That or maybe MGL isn't hard to install/use? I have no mac to test that.

ghost commented 2 years ago

I think the approach is still interesting.

I agree, especially given the potential performance improvement. (I don't like vinegar on my salad, but wouldn't suggest other people shouldn't enjoy it, if you can tolerate one more food joke.) Thank you for posting the link and doing the testing.

I don't have a (capable enough) Mac to try it out on either, but if you do have success I wonder if it would help for you to post a simplified gist somewhere so someone who does could try it out.

JMC-design commented 2 years ago

Trying to come up with a good test for display as well. But so far, with just 333,333 points there's no time difference in packing cl arrays from either origin vectors or 3d-vector structs. From vectors uses slightly less cpu, but I probably need more points, since this is all taking ~0.004 seconds. .020 using generic functions. submitting cl arrays to gl by pinning them and passing the pointer is, well, just passing a pointer. I guess I should probably through in some static-vectors stuff.

JMC-design commented 2 years ago

so here's just some basic testing. If you make smaller arrays then origin's lead widens. whether it's worth the trade off in not being able to dispatch on...
But the surprising thing is the foreign being slower. If we can depend on just using sbcl to send pointers then i'm not sure what the benefit is.

https://plaster.tymoon.eu/view/3413#3413

kaveh808 commented 2 years ago

Nice work. Is the cost of sending sbcl pointers and ffi arrays to OpenGL (and GPUs) the same?

On a slight tangent, should we bite the bullet and go with double-float as our default? Or is the performance hit a serious one?

JMC-design commented 2 years ago

i can't see why it would be different as they're both just pointers to memory. Unless being in sbcl's mem space somehow affects it. That's why I think an actual drawing test might elucidate further. at least just in terms of packing/repacking something over and over.

I don't know if I've been reading out dated stuff, but what I've seen is that lots of opengl drivers will just convert to single as their internal format. The support for doubles for gp compute is relatively new and requires above 4.1 and in some cases a new card. I've seen figures of half to 1/3 of performance of singles. For anything like CAD I'd think a fixedpoint format would probably be better.

awolven commented 2 years ago

opengl is a foreign library

On Wed, Sep 7, 2022 at 11:35 AM Johannes Martinez Calzada < @.***> wrote:

I tried, but it reads like c and I don't see any lispy abstraction. The only thing I see is direct writing of individual bytes to foreign memory.

— Reply to this email directly, view it on GitHub https://github.com/kaveh808/kons-9/issues/90#issuecomment-1239626674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGMMOZD4LKQ66EUNCPNELV5C74ZANCNFSM6AAAAAAQFI53LM . You are receiving this because you were mentioned.Message ID: @.***>

awolven commented 2 years ago

I'm a vulkan enthusiast, but I have too much on my plate at this time to volunteer for porting opengl.lisp. I can provide vulkan bindings and some sample code on how to make triangles and triangle_strips of various colors render in vulkan, but I have the text engine to debug/extend and a whole host of other projects relating to other things. Perhaps someone who doesn't necessarily have vulkan experience could volunteer. Vulkan's not that hard.

For the MacOS platform, I'm working on cl-metal using a objective-c bridge from fiddlerwaoroof. Vulkan does work on mac, but it doesn't support compute shaders yet, so I'm going the Metal route rather than wait on MoltenVK. Metal and Vulkan use different shading languages so it would be great if someone could work on a (possibly CEPL-based) lisp syntax that could be compiled to either GLSL 4.5 for vulkan or the Metal shading language.

On Tue, Sep 6, 2022 at 3:57 PM Kaveh Kardan @.***> wrote:

I am very keen to maximize use of the GPU as well as SIMD and multiple cores. I really want our system to be able to handle production-level datasets with the same (or better) speed as commercial packages.

How we architect this (improved OpenGL interface, Vulkan, compute on GPU) is something we should discuss.

If we do have a Vulkan enthusiast, a first step could be to implement the equivalent of the code in opengl.lisp.

Also, one of my goals is to develop a cross-platform GUI toolkit. Currently we're building it on OpenGL, using the text engine by @awolven https://github.com/awolven and font rasterizer by @JMC-design https://github.com/JMC-design .

— Reply to this email directly, view it on GitHub https://github.com/kaveh808/kons-9/issues/90#issuecomment-1238641426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGMMOZCHMQGEROWZLALPDV46V2FANCNFSM6AAAAAAQFI53LM . You are receiving this because you were mentioned.Message ID: @.***>

lukego commented 2 years ago

Perhaps someone who doesn't necessarily have vulkan experience could volunteer.

I volunteer to make an attempt this month. What do I need to know to start off in the right direction? (Either in absolute terms or based on the tiny start I made in #109 a ways back.)

theottm commented 1 year ago

I'm interested in trying to write this. I will try to build on what @JMC-design has proposed and the text-rendering engine @awolven has written.

It would probably make sense to reuse parts of the code of the text-rendering engine. In order to do so I would have a lot of questions, since there are a lot of things I don't understand the purpose of - it seems like a pretty advanced implementation to me which take a lot of nitty details of OpenGL into consideration, am I right ?

Anyway, I'll start by proposing something and hopefully we can improve on it incremental after with your feedback.

awolven commented 1 year ago

The text rendering engine is a two-implementation immediate mode hack[s] to get Kaveh working with text. I say two implementations, because so long as Kaveh uses OpenGL 1.1 for the rest of Kons-9, macOS will be a different implementation than any "modern opengl" implementation used in Windows and Linux. This is because the opengl 2.1 implementation of macOS is not forward compatible with opengl 3+, unlike Windows and Linux. So there is an opengl 2.1 and an opengl 3.3 version of the text rendering engine. The least common denominator is opengl 2.1 and opengl 2.1 doesn't even use shader programming. So to borrow your term "modern opengl"... a modern opengl version of Kons-9 would require a rewrite of the logic in opengl.lisp at the minimum.

This has been done before with vulkan, however Kaveh rejected the vulkan branch and continued to make changes to the main branch until the vulkan branch bit rotted. Kayomarz has volunteered to update the vulkan implementation, but only has weekends to work on it and has not posted any updates for that effort in some time.

By modern opengl, I am assuming you are talking about opengl 3.3+. Kons-9 is in need of a proper graphics engine to make developer's lives easier and make the program scalable functionally, opengl or otherwise. A modern opengl implementation would be based on now decades old tech and would be essentially be reimplementing the logic of the vulkan engine (called "krma"), which can render thousands of text characters without so much as a blip in the frame rate unlike the immediate mode implementations currently in Kons-9. So if you want to upgrade kons-9 to a modern opengl version for GLSL programming and you have little OpenGL or Common Lisp experience, you would basically just be spinning your wheels...for a lot of reasons. First, this type of work takes knowledge, and second, if there is some kind of absolute insistence on using openGL instead of something newer like vulkan, you're better off adding that capability to krma and letting Kayomarz finish porting opengl.lisp to krma, which would allow for kons-9 to support both opengl and vulkan.

As far as "modular", krma is modular and you can add and remove pipelines while the program is rendering.

Furthermore, while krma fully supports the immediate mode rendering paradigm of kons-9, in the long run one will want to support retained mode paradigms, for performance, which is going to require somewhat of a reorganization of kons-9, unless you live in a cold cabin and need your PC to double as a toaster oven.

-Andrew Wolven

On Sat, Sep 23, 2023 at 5:28 PM Théo Tyburn @.***> wrote:

I'm interested in trying to write this. I will try to build on what @JMC-design https://github.com/JMC-design has proposed and the text-rendering engine @awolven https://github.com/awolven has written.

It would probably make sense to reuse parts of the code of the text-rendering engine. In order to do so I would have a lot of questions, since there are a lot of things I don't understand the purpose of - it seems like a pretty advanced implementation to me which take a lot of nitty details of OpenGL into consideration, am I right ?

Anyway, I'll start by proposing something and hopefully we can improve on it incremental after with your feedback.

— Reply to this email directly, view it on GitHub https://github.com/kaveh808/kons-9/issues/90#issuecomment-1732423071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGMMIIV6XNHIJ2SI3J47DX35PBFANCNFSM6AAAAAAQFI53LM . You are receiving this because you were mentioned.Message ID: @.***>

ghost commented 1 year ago

unless you live in a cold cabin and need your PC to double as a toaster oven.

I used to render movies on my Mac Dual G4 only in the winter in Colorado, b/c it used nearly 1500W, like a hair dryer (which would have been quieter).

in the long run one will want to support retained mode paradigms

Retained mode caching in OpenGL-based scene graphs usually used "display lists". What method exists to do that now?

theottm commented 1 year ago

I see. I could also join the effort of porting kons-9 to krma then, if this makes more sense. I'm mostly interested in having a rendering engine I can understand and modify on the fly. If krma can fulfill this role, I'm in.

About the modularity of krma, how would you do things like offscreen rendering, multiple passes? How would you create and load custom pipelines? Having some simple examples would be nice.

ghost commented 1 year ago

Kaveh rejected the vulkan branch and continued to make changes to the main branch until the vulkan branch bit rotted.

I have the feeling anything I say here is going to get me in trouble with someone. Adieu.

theottm commented 1 year ago

Could krma evolve to become something like CEPL for vulkan? Because that's in the end what I am looking for: a CL interface to a graphics API. Not just the bindings of course, but an interface that make programming OpenGL or Vulkan in CL more natural

ghost commented 1 year ago

CEPL

+1

Adieu to this topic.