Linting - Githubissues

Goal

To provide for real-time error detection, while the user is working.

Introduction

Consider traditional linting of source code.

linting

As the user is typing, errors are highlighted and explained interactively. Now imagine the same for content creation.

Pyblish does a great job at detecting and preventing errors when being asked to, but I think there are much greater things to come and Pyblish could be the worlds first "content linting" solution in existence.

Architecture

Traditional linting works by performing some computation on source code being worked on by running a background process during short periods of idle. By processing in the background, and while the user is not currently interacting with the code, linting appears seamless and interactive.

In our case, the computations represent plug-ins.

class ValidateSomething(pyblish.api.Validator):
    interactive = True

However, we face a number of potential issues not present with linting of source code.

Performance
Simultaneous access
Visualisation
Performance

Not all plug-ins may be suitable for linting, but considering the fact that normal interaction with a 3d scene (e.g. navigating the scene, renaming objects, rotating controllers) typically leaves a lot of room for additional processes we should have no problem letting Pyblish go to town gathering data during idle and processing that data in separate threads, or even processes. It's the gathering part that typically blocks user input, and thus this is what may limit what sort of computations can be done during linting.

Simultaneous Access

As opposed to source code, which has the benefit of being stored on disk at all times and thus allows for access from multiple locations, content is typically all stored in memory within a single process. As mentioned above, this hinders our ability to simply trigger linting at arbitrary times during normal interaction.

Constrained to times during idle, there should still be plenty of room for linting. An otherwise, there might be options for linting during times where saving is instantaneous, such as during the initial stages of working with a scene.

Visualisation

This is the big one. Source code has the benefit of being uniform, not only across IDE's or text editors, but also throughout the entirety of a project; it's all text. Content on the other hand comes in many flavours and shapes, reaching from errors in naming convention, to curves with too slant of an acceleration to normals being inverted - the obvious question is, how do we visualise the errors we find?

Following the lead from software linting, we could defer each message to the terminal/console of each host. I.e. whenever an error occurs, simply print a message saying what's wrong and why.

Taken one step further, we could let our current of future GUI's do the talking with fancy graphics and heads-up displays. Ideally however, I'd like for errors to appear right next to their origin; as they do in linting for source code.

Discussion

I believe the question is not "Would linting help us produce better content?" for I feel the answer to this to be an obvious "Yes", but rather "How would linting benefit you?" and "What should linting look like/for?"

PyFlakes is a great example of validating correctness of source code. By visualising errors that do not conform to standards or represent syntactical mistakes, users are guaranteed to write code that not only works but are also uniform in the eyes of other programmers. This makes it easier to share and to collaborate on source code.

Can the same be said for content linting?

McCabe is yet another great example of a slightly different take of linting. It looks for _complexity_ as opposed to correctness. Complexity may be things such as "No class should contain more than X amount of member variables".

Could we validate/lint for similar issues?

References

I think it seems like a cool idea, but one of my main concerns is when to trigger this linting.

If the linting is too frequent I could see myself shouting at the computer; "I know its wrong! But I'm not finished yet!".

I guess this also depends on the visibility on screen, which is a tough balance. Too visible becomes annoying, not visible enough and the point of the tool is lost. Ideally it would be something you notice in the corner of your eye, and address if you think you are finished.

I think it seems like a cool idea, but one of my main concerns is when to trigger this linting.

Mm, I agree.

In source code, errors are typically trivial and can be fixed relatively quickly and thus never really have time to get in the way. I suppose it depends how you use it, as well. I develop with linting turned on always, but I know some turn it off occasionally, while others only turn it on occasionally.

Lint during idle
Lint on demand

Then there is "offline" linting in which users may:

Lint when saving
Lint when loading
Lint manually, via command-line

Maybe the same could be applied here?

Ultimately, the goal here is to find defects quicker.

I guess this also depends on the visibility on screen, which is a tough balance.

Yes, I think so too.

With source code linting, the actual linting is always (?) a separate dedicated process that loads your code and outputs a report. IDE's then parse this report to produce visual cues, such as marking lines and words.

If we apply the same methodology here, then it could be a matter of host-by-host implementation of visuals. That is, we could implement each visual cue independently and highly coupled to what it relates to.

For example, naming convention validation may appear in the Maya Outliner, in the same visual style as it does for source code; either by underlining or boxing in, yellow for warnings and red for errors.

Mesh errors might instead utilise Fabric Engine to draw interactively on top of the viewport; invalid meshes may get a yellow or red overlay.

For everything else, console messages and dedicated GUIs could end up as a last resort for anything that doesn't have a neat visual representation.

This idea is great and could be super helpful ...if you can prove that there are some useful background processes to run.

I've written tools to validate scene files before publishing and/or rendering. These on-demand validations were often pretty intensive and would slowdown the workstation significantly.

Can you define a set of validations that is both useful and unobtrusive to run in the background?
Does the benefit of this set of validations warrant the building of an automatic linting system?

There are definitely some challenges to the system that you design and build. Here are some of my first considerations:

Fancy UIs are fun, but ultimately the information is more important than the way it is displayed. Avoid getting caught up in the interface until the linting process is proven to be useful and unobtrusive.
The system should be pluggable or extensible so that it is simply to add custom validations for a particular context (show/shot/asset/...).
It would be great to have the ability to suspend, de-prioritize, or wholly cancel a background process so that the user never feels bogged down. Unfortunately, working within the bounds of an application API may make this nearly impossible.

Thanks @krets, some really good points.

This idea is great and could be super helpful ...if you can prove that there area some useful background processes to run.

For starters, we could go with linting naming convention within Maya.

Compute Snapshot

To interactively parse the names of existing nodes and to highlight them, in the Outliner, in a similar fashion to traditional IDE linters (see top example). I can see a number of benefits to this route.

Getting a snapshot of all available names within a scene is an instantaneous (< 0.1 seconds) process (via cmds.ls())
Having a snapshot means not being bound by a main thread and can thus process in the background
Processing in the background (either thread or process) means there is room for complex computations, or checking of many names using a relatively low-performance language (e.g. Python)

One of the disadvantages of this approach however is that some things are relatively heavy to snapshot; such as geometry or pixels.

Headless Compute

Alternatively, if we choose to "Lint on Save" we could potentially allow for any computation to occur within an independently running, headless version of e.g. Maya.

The headless process would have native control over the main thread of a host.
To kick off linting, the headless process could listen via IPC for a signal to start linting with a given absolute path to the file to lint.
Being a separate process, it is unobtrusive and may be cancelled (killed) at any point.
Upon completion, a report is sent back to the original host which is then parsed and visualised.

I can think of a few disadvantages to this approach however. Mainly that although an IDE can kick off multiple processes in the background and expect results within less than a second, killing processes as the user continues to type, in our case it would be more tricky as Maya and other hosts - despite being headless - take a few moments to simply get up and running.

The second disadvantage is memory use. Although processing shouldn't (we need tests) affect interactivity within the main host, memory might as having even a single additional host running in the background could severely impact its availability. For example, consider linting a scene with a heavy fluid simulation or crowd.

Fancy UIs are fun, but ultimately the information is more important than the way it is displayed. Avoid getting caught up in the interface until the linting process is proven to be useful and unobtrusive.

Agreed. After some thought, it may be optimal to stick with what IDE's are already doing, which is simply highlighting errors on the spot, but providing a description elsewhere, such as in a console or script editor.

The system should be pluggable or extensible so that it is simply to add custom validations for a particular context (show/shot/asset/...).

I believe we've already solved this, and linting of this sort could potentially build upon what Pyblish already does.

Hey Marc, this is a cool idea indeed. Real time linting of content or I would rather call "dynamic visualization of data" is something everybody like to have. As everybody said the content nowadays is very huge, parsing those content at real time and processing data is a question of performance. I would like the idea of "Linting on demand" or "Linting offline". LInting on demand means - User select a portion of the viewport and lint that portion. or by other certain criteria. Linting offline - as you mention.

Really looking forward to see some mockups!

pyblish / pyblish-base

Linting #133