Adding support to meshes without object specification and per vertex color

SergioRAgostinho commented 6 years ago

I'm dealing with a couple of obj file which are not being being properly parsed by PyWaveFront.

The traits are:

[x] vertex information is provided without providing an object name per the o keyword, i.e. it's a single nameless object.
[x] they have per vertex color (see last two sentences of the "spec" entry).

I'm gonna extend the library to support these for my use case. Should I submit them here as well? I believe they're both still within the specification.

greenmoss commented 6 years ago

Sure, PRs always welcome

SergioRAgostinho commented 6 years ago

I've started looking into the second point regarding the per point color specification. These are the changes I'm thinking about implementing:

Normally, either all vertices have the color augmentation or they don't and the usual material spec is used. This means the moment you detect color info on a vertex you can reject any subsequent material use until a new object is provided. Materials and per vertex color are mutually exclusive
Since now we will have meshes with no materials, I'm thinking about moving the vertex information to the Mesh data structure and only retain indexes to the vertex list in the Materials data structure.

Does it sound reasonable?

I'm a little concerned with the changes because I haven't been able to render anything with the draw command before. I need to read a little more into pyglet to make sure I'm not skipping some fundamental setup step.

Edit: Just noticed you're packing tightly [texture-coords, normals-coords, vertex-coords] inside for every single point, even in situations in which there isn't any texture coord associated or even normals...

In detail

~~[ ] move (combined) vertex information to Mesh data object and keep (combined) vertex indices in Material.~~
~~[ ] pack texture coords, vertex coords and normal coords individually inside Mesh.~~
~~[ ] update the drawing routines~~
~~[ ] start parsing color entries on the parser~~
~~[ ] augment Mesh with a colors list~~
~~[ ] update drawing routines to make use of per vertex color information~~

Basically I need to make a lot of changes in order to implement what I wanted cleanly. At this point it's just easier to address the things quickly in my own fork and we can go point by point if you think this is useful for the lib.

greenmoss commented 6 years ago

I think I'm way behind your knowledge here 😄

If you propose the PR, I will trust your knowledge, and merge it.

greenmoss commented 6 years ago

(close was accidental)

einarf commented 6 years ago

What you write here makes sense. This is something I considered doing as well. What needs to be done is to figure out the vertex format of each mesh. We can build a string describing the format and map them to vertex modes in https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/glInterleavedArrays.xml

The format string (eg. C4F_N3F_V3F) can then be used to inspect the format externally without relying on pyglet itself. Draw modules can map string to draw mode using GL/pyglet enums.

Vertex formats should then be limited to

GL_V3F,
GL_C3F_V3F,
GL_N3F_V3F,
GL_T2F_V3F,
GL_C4F_N3F_V3F,
GL_T2F_C3F_V3F,
GL_T2F_N3F_V3F,
GL_T2F_C4F_N3F_V3F,

Also moving vertex data away from the material is vital.

SergioRAgostinho commented 6 years ago

My idea was to load everything individually each as vertex attributes, i.e.: have texture coords, load that in an individual step; have normals, load that also as an individual step, so on and so forth.

This assumes use of OpenGL’s core profile. Not sure if that’s what’s is being used here, since I don’t remember seeing any shaders in the repo.

Maybe pyglet doesn’t even support it…

einarf commented 6 years ago

Loading positions. colors, uvs, normals separetely makes sense compared to how the current parser is handling things now. This means a method can simply yield the values and consumed into a list to vastly reduce memory allocation speeding up the parsing significantly.

You don't really need core profile as long as you end up generating a list of interleaved data with the formats supported by glInterleavedArrays (in my previous post). Before reading the polygon list, you will also know the vertex format, so you can let the user send in their preferred vertex format in the polygon reader function and simply make it yield each individual float value that can then be consumed into a list or numpy array externally.

Pyglet just use interleaved vertex data that looks just like an interleaved VBO. You are just restricted to the supported enum values I listed earlier. That is pyglet's problem to figure out in its draw functions.

Just figure out the corresponding format enum value for glInterleavedArrays here: https://github.com/greenmoss/PyWavefront/blob/d05ba434f105b1144defc7357d499f725c6086b0/pywavefront/visualization.py#L64-L68

einarf commented 6 years ago

The parser can use its own constants independent of pyglet to describe the what data is avaialble. For example: T2F, C3F, V3F, N3F.

Another idea is to base it on GLTF if it's conceivable that a type is not always hardcoded to N components:

 [
    ("POSITION", "VEC3"),
    ("TEXCOORD_0", "VEC2"), 
    ("NORMAL", "VEC3"),
    ("COLOR", "VEC3"),
]

This can break the pyglet drawing for some models, but I don't really know is that's even possible to fix. I would try to make it work when possible, but if we are living by the contraints of pyglet, we may be doomed 😄

einarf commented 6 years ago

As an example:

I'm using this package to load obj files for https://github.com/Contraz/demosys-py. It only supports core 3.3+/4.1+ using VAOs.

I mashed together a 15 min scene loader last night (no materials): https://github.com/Contraz/demosys-py/blob/master/demosys/scene/loaders/wavefront.py

Main problem is that I always have to assume vertex format and components for each type and that I don't have any control over the vertex format in general. Also, half of my models don't even load because of #32 (also related to this)

SergioRAgostinho commented 6 years ago

You don't really need core profile as long as you end up generating a list of interleaved data with the formats supported by glInterleavedArrays. [...] Pyglet just use interleaved vertex data that looks just like an interleaved VBO. You are just restricted to the supported enum values I listed earlier. That is pyglet's problem to figure out in its draw functions.

That is true, no need for over engineering. The focus of this lib should be on parsing values from obj data a providing appropriate data structures to retrieve them.

Before reading the polygon list, you will also know the vertex format, so you can let the user send in their preferred vertex format in the polygon reader function and simply make it yield each individual float value that can then be consumed into a list or numpy array externally.

I would say to automatically pick the enumerate option which matches the data available. I stress on my original point that I would not try to focus too much here on providing a lot of flexibility on the drawing routines.

I think we have a feasible set of ideas here. There's some preliminary points I want to address first before venturing into this but I'm available for providing feedback on in case any of you wants to pick this up already.

einarf commented 6 years ago

I would say to automatically pick the enumerate option which matches the data available. I stress on my original point that I would not try to focus too much here on providing a lot of flexibility on the drawing routines.

Agreed. Keep it simple for now and rely on the formats pyglet can deal with. as long as the library exposes the details about the vertex format for each mesh, it can be transformed on the outside if needed. If it becomes a problem in the future, it can always be solved then.

Might take a stab at this in the future, but do tell if you start working on this. Will have to properly understand how these changes might affect the current flow in the library.

einarf commented 6 years ago

Some ideas for a more flexible/fast implementation. This means parser.py will need some changes.

[x] dir should be an attribute as it can be accessed by all parser methods anyway. Set in __init__.
[x] Parser should have generator attribute we can call next on to get the next line. This means parser methods can yield multiple lines if necessary such as consuming the entire vertex list in one method call. This is flexible enough to make it up to the method implementation.
[x] Parser should have a line attribute representing the current line the parser is working on. This attribute is cleared when we are done with the line. This means parser methods can still over-read from the generator without losing data.
[x] Let each parser method work on self.line. We are currently doing a lot of splitting and unnecessary iterating in the parse function. Let it be up to each parser method.
[x] Resolve the parser methods in __init__ using class inspection looking for parse_ functions. Store each function in a dict using the line_type as key to avoid creating an additional string in the parse function.

Also remember to support meshes without o spec meaning we need to keep a separate reference to the current mesh that is cleared after poly list is parsed.

einarf commented 6 years ago

Will continue when the current pull requests are sorted.

Next is looking at making line reading available for the parse functions so we can quickly bulk read those and yield the data into an array to reduce memory allocations.

SergioRAgostinho commented 6 years ago

Might take a stab at this in the future, but do tell if you start working on this. Will have to properly understand how these changes might affect the current flow in the library.

It will take me a week or so to have time to look at this but I'll ping once I set my mind on what to address first.

einarf commented 6 years ago

I'll continue on the ground work meanwhile so we can move towards reading positions, colors, normals and uvs in one chunk. I'd imagine consuming those directly into a list with yield would do wonders to performance as well.

einarf commented 6 years ago

Just for fun, some rough profiling stats with cProfile. (py3)

Simple stat profile when parsing a moderately large obj file (3.3MB) ordered by tottime (the total time spent in the given function (and excluding time made in calls to sub-functions))

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    51963    1.250    0.000    1.507    0.000 __init__.py:111(parse_f)
   106907    0.386    0.000    2.324    0.000 parser.py:52(parse)
   262793    0.162    0.000    0.162    0.000 {method 'split' of 'str' objects}
      2/1    0.131    0.065    2.458    2.458 parser.py:47(read_file)
    33803    0.114    0.000    0.120    0.000 __init__.py:80(parse_v)
   155889    0.076    0.000    0.076    0.000 __init__.py:134(<listcomp>)
   106953    0.071    0.000    0.071    0.000 {built-in method builtins.hasattr}
    20632    0.063    0.000    0.067    0.000 __init__.py:86(parse_vt)
   107010    0.052    0.000    0.052    0.000 {method 'startswith' of 'str' objects}
    51999    0.039    0.000    0.039    0.000 mesh.py:43(has_material)
   211339    0.038    0.000    0.038    0.000 {built-in method builtins.len}
    51999    0.031    0.000    0.070    0.000 mesh.py:49(add_material)
   106875    0.030    0.000    0.030    0.000 {built-in method builtins.getattr}

The big time spenders are parse_f, parse. parse used to be on the top before the loop was removed. It can quickly add up when dealing with larger files. We should at least find a balance here between speed and readability.

Some of the smaller time spenders here are also trivial to eliminate, but there's no reason dig deep into this until the new parse methods are created. It will change the profile completely.

einarf commented 6 years ago

I'm mostly done with my list now. I have changed the line reader to a generator so parse_ functions can be allowed to consume multiple lines. Blocked by my 2 x PRs still to be able to make a new PR without too much chaos. ping @greenmoss

einarf commented 6 years ago

Done! (Again: Making PR when the two other ones are resolved)

The remaining part now is to move geometry data to Mesh. Detect vertex format and update drawing. vertices, normals, uvs and whatnot are read in one chunk.

New profile is:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1247148    0.722    0.000    0.944    0.000 obj.py:148(consume_faces)
       36    0.148    0.004    1.092    0.030 obj.py:122(parse_f)
   262818    0.112    0.000    0.112    0.000 {method 'split' of 'str' objects}
   106910    0.090    0.000    0.217    0.000 parser.py:86(next_line)
    33839    0.057    0.000    0.121    0.000 obj.py:36(consume_vertices)
   106909    0.047    0.000    0.050    0.000 parser.py:73(create_line_generator)
   155889    0.047    0.000    0.047    0.000 obj.py:154(<listcomp>)
    20668    0.028    0.000    0.066    0.000 obj.py:82(consume_texture_coordinates)
   106910    0.027    0.000    0.077    0.000 {built-in method builtins.next}
      144    0.014    0.000    0.014    0.000 {method 'write' of '_io.TextIOWrapper' objects}
       36    0.006    0.000    0.127    0.004 obj.py:33(parse_v)
       36    0.004    0.000    0.070    0.002 obj.py:74(parse_vt)
       15    0.002    0.000    0.002    0.000 {built-in method marshal.loads}
      405    0.001    0.000    0.001    0.000 {built-in method _codecs.utf_8_decode}
      2/1    0.001    0.001    1.315    1.315 parser.py:96(parse)

The cumtime for parse is down to 1.315 from 2.458, so that is a pretty nice improvement. (Using the same 3.3MB obj file). Still probably a lot that can be improved in the face reading method, but let's not dive into premature optimization 😄

greenmoss commented 6 years ago

Thanks for the PRs! I feel smarter by proximity to the comments here 😉

I merged #31 and #33 from @einarf.

I don't have the "official" circleci status up or documented yet, but note that I got some new failures:

https://circleci.com/gh/greenmoss/PyWavefront/21

Compare to previous iteration, which has only a failed gl call:

https://circleci.com/gh/greenmoss/PyWavefront/20

einarf commented 6 years ago

@greenmoss Ran tests locally in py2 and 3 now. Those should be fixed. Making new PR in a moment.

SergioRAgostinho commented 6 years ago

@greenmoss Are those circle ci builds being triggered automatically with each new PR and commit? I don't see them on the "checks" section as it is usually the case.

greenmoss commented 6 years ago

@SergioRAgostinho yes they are triggered automatically, but only on commits to master. I need to reconfigure it to test PRs too, and post the results. I'm guessing it won't take long to do, I just have to actually "do" it 😄

greenmoss commented 6 years ago

The CircleCI tests might be working now. I tried it on a local PR and it kicked off. It says it will also do it from PRs from forks. We'll see.

greenmoss commented 6 years ago

I've been reading through this issue and the #38 merge. I get the feeling there is more left to do even after #38. Should I leave this open?

einarf commented 6 years ago

Yes. Leave open. Most of the ground work is done.

Still need to move the geometry to Mesh and, auto fit the vertex format + update pyglet draw code.

einarf commented 6 years ago

After looking at the obj format a bit closer, it seems the main challenge is to figure out how to handle the geometry properly. Just moving the geometry to Mesh isn't straight forward at all considering a "Mesh" (data for an o statement or nameless object) can have one or multiple materials. Materials could also potentially be shared between geometry that do not have the same vertex format. I'm guessing this is rare, but we should have a check for this.

I also see some projects referring to "shapes" where a shape is a collection of geometry of the same format that shares the same material. A "mesh" or "obj" is then just a meta object containing a pointer index and length to some other external buffer data.

I think the entire point of this distinction is that users can the chose if they want to interact with mesh/obj or shapes.

SergioRAgostinho commented 6 years ago

My original idea was to have a Mesh(object) holding a buffer with all vertex data that belongs to it. A mesh can then contain multiple materials inside of it. Each material then holds an index buffer to the vertex data it uses.

Do you foresee any use case which wouldn't be handled by this representation?

I confess I'm having trouble understanding what exactly are "shapes" from your comment.

a shape is a collection of geometry of the same format that shares the same material

geometry - vertex and faces (connectivity) material - same properties applied to the same vertices ?

It sounds like these are multiple instances of the same Mesh and that's why I'm convinced I didn't understand what you meant. Apologies if I'm spreading confusion.

einarf commented 6 years ago

A shape is just how we do it now. All the geometry that belongs to one material is mashed together in one place. It can be data from several objects or part of one or multiple objects (assuming the vertex format is the same). These are the most convenient to use for rendering the object as efficiently as possible.

Having the vertex data for each Mesh itself can be a challenge since the face list can refer to vertex indices from previous objects. This resets when a new material is set. I think storing the vertex data in the material is simpler (as we do now). Instead have the mesh point to materials and some metadata about what vertices in each material belong to the mesh.

I guess it depends if we want to end up drawing meshes or material/shape. Considering some objects may have material changes in the face list it would be a challenge drawing a Mesh as a whole in one draw call/iterleaved vertex buffer.

This is was the reason for adding a concept of a "Shape". It may be someones wishes to get vertex data for a single object/mesh as well. Not entirely sure how to solve that.

I may be completely overseeing things here. Organizing the vertex data in a clever way is not that easy. I see tree.js and other libraries have had an evolution when it comes to handling vertex data as well. Maybe tinyobjloader also could give some ideas.

SergioRAgostinho commented 6 years ago

I'll be referring to concepts of vertex data, element data or elements as presented in this spec http://www.martinreddy.net/gfx/3d/OBJ.spec . I retrieved it from the Wikipedia page.

Vertex Data

I had a look another project with the same purpose https://github.com/syoyo/tinyobjloader and they seems to hold vertex data as something which belongs to the whole file. Maybe porting to our logic it would be under the Wavefront object, but I'm sure this need to be global.

Element Data

element data like faces or lines reference vertex data through indexing. Elements can be assigned to an object and a material, only one of each category.

Maybe we should work with something which is an "Element Container" type of structure which can hold the different existing elements and can optionally have set an object or a material to it. Once a new state setting attribute is specified a new "Element Container" is initialized.

These might be our Mesh object or something with a different name

Objects

This is an optional statement and once it is declared, all subsequent declared elements belong to it. There's no default name specified.

Material

Also an optional statement and once it is declared, all subsequent declared elements belong to it. If no name is provided upon invoking usemtl a default white material is used.

Back to the Original Topic

For my goals, I'm more convinced that the per vertex color information was an "unofficial" extension to the spec. There's even the following sentence

To support color interpolation, materials must be assigned per vertex, not per element.

I'm working on models generated by people in the robotics domain and I'm convinced the that the adherence level to the spec is questionable. This raises the question if it is worth support the extension here. I will need to implement it anyway in my fork.

Summary

Here's my suggestion regarding the data structures.

Wavefront:

vertex/vertices - a container for all vertex data
normals - a container for all normal data
texture-coords - a container for all texture coords
materials - a container for object type Material with all used materials
objects - this is just a container with object names
Mesh/Element Group - a container of groups of elements

Mesh/Element Group:

faces - container of triplets
some other more exotic elements we decide to support...

I also believe the obj is not a format with data loading (to graphics) efficiency in mind. By this I mean, you won't get a contiguous buffer to easily load data into your graphics lib. It feels that it was conceived to be very flexible and compact in storage. Since we're only interested in exposing the obj file data maybe we should not concern ourselves with that other problem.

einarf commented 6 years ago

I'd say as long as we can fetch interleaved vertex data for every material (or mesh) like it works right now, it will be fine. How the data is organized at parse time might not matter too much.

It should support the formats:

GL_V3F,
GL_C3F_V3F,
GL_N3F_V3F,
GL_T2F_V3F,
GL_C3F_N3F_V3F,
GL_T2F_C3F_V3F,
GL_T2F_N3F_V3F,
GL_T2F_C3F_N3F_V3F

That means we can now draw most obj files and not only those with all four attributes.

I'm suspecting the ability to draw with pyglet is a selling point for this project making it unique compared to similar libraries.

einarf commented 6 years ago

I'd say keep it close to how it worked before gathering the geometry data in each individual material. For every face list added to the material we check if the vertex format is the same. Raise an exception if this rule is broken. (Not sure if this is acceptable, but exceptions can be dealt with)

For every face list we add to a material, we also add the material to the current mesh. This means a mesh can potentially contain multiple materials. We store a meta structure in the mesh referencing the material and an index range for the elements for each material. (I guess this might be something like your "Element Container") This means we can make a method returning interleaved data for each material in the mesh. Users can chose to either draw through materials or Mesh (potentially having to create multiple index buffers to draw only one object)

This can easily support both vertex color and material statements in the face list and we don't have to radically change the way things are working at the moment.

Anything with this idea that create issues for your use case? I'm more into high performance rendering, so I know very little about how the obj format is used outside of this context. I do see the need for being able to draw individual meshes if they are attached to animations. This solution does support both approaches.

SergioRAgostinho commented 6 years ago

In general, the obj files I handle don't have materials. This is my only "shortcoming". I can always work around that as long as the parser doesn't freak out for reading a f line before usemtl is specified.

Overall I don't feel "Faces inside Materials inside Meshes" enacts the flexibility that the obj format allows, hence my previous suggestion. However I do understand that this is a code base already in use and as such, breaking changes to the API should be kept to a minimum.

einarf commented 6 years ago

I think both situations can be supported. It depends what information you need in the Mesh. Do you just need interleaved data or separate arrays with positions etc?

EDIT: The idea of having a better global container for the vertex data is not a bad idea (similar to tinyobjloader).

einarf commented 6 years ago

In theory everything here was resolved in #42 and #43 unless @SergioRAgostinho has any other needs regarding how the data is organized. (Mesh vs Material etc)

greenmoss commented 6 years ago

OK by me to close. @SergioRAgostinho ?

SergioRAgostinho commented 6 years ago

Oh :open_mouth: @einarf implemented the per vertex color parsing already. Thank you for that :heart:

pywavefront / PyWavefront