Improve build error/warning suggestions

agjohnson commented 6 years ago

Currently, we are duplicating throwing an error logic in our build tips, along with the pattern of using logic in templates to determine this. There are two areas that we can improve on to provide a more useful pattern.

For example, a build is failed when we detect a failure state in the build task. This build is then recorded with the failure text and an exit code. On view, we are duplicating this logic, searching for a reason for failure. I think the build should be updated on failure with enough information that the view or template can simply reference the error.

For example, some vaguely psuedo python:

class BuildDetailView(View):

    # This can be automatic with the help of a meta class on BuildException
    ERROR_MAP = {
        501: ProjectImportError,
        ...
    }

    def get_context_data(self, **kwargs):
        ctx = super()
        ....
        # There are a few way to do this, maybe exit code is not a good idea. Build errors
        # would have explicit exit codes for referencing the error though
        try:
            error = self.ERROR_MAP[project.exit_code]
            # Load from template, or pass into build detail template somehow
            suggestion = error.get_suggestion()
        except KeyError, AttributeError:
            pass
        return ctx

We also need to add a field to the build model to track the error code. This will be used to reverse the suggestion. Build exceptions should have a get_suggestion() method and perhaps some easy method of templating out new suggestions.

This will accomplish:

Deduplication of this logic
Moving error display logic out of templates
Ties exception to suggestion

Update:

To summarize what we are discussing doing here, and to maybe provide a little additional context to this change:

We want a way to emit warning messages from builds, and therefore multiple warning/error messages. The error should not just be a string model field
The code above isn't exactly necessary. It tries to modularize error message, but we could just as easily emit a JSON error object with both an error id and error message. The key takeaway here is that error messages have some metadata or context that the templates can use. Ultimately, the build error message should just (loosely) be a template like {% for error in build.errors %}<div class="{{ error.class }}">{{ error.message }}</div>{% endfor %}

humitos commented 6 years ago

I like this idea.

Adding a little more to this, I think we can make the Build.error a ForeignKey to a new model called BuildError and instead of storing always the same string in the database, re use them by linking to a BuildError instance since we are not having random errors anymore.

Besides, each of this BuildError (or the Build itself could have the logic to get the suggestion involved for this error. I don't think that it will be always a 1:1 mapping or a mapping based only in one value as exit_code or similar. I suppose that for some suggestions we will want to do additional checks in 2 or 3 fields (output message + exit_code + BuildError type, for example)

agjohnson commented 6 years ago

I'd vote against a separate table, it probably isn't necessary. I think we want to add a Build.error_id that would point to an id on a build exception class. During display of the build, if a project has a Build.error, we display that (for legacy purposes, also for unhandled error cases). Otherwise, we do a lookup of Build.error_id -- this field should exist on all new builds. I think the error -> suggestion mapping, as well as other logic to get the display error message, can just as easily live on Build.

I say this as Build will only have 1 failure reason, and we want to push the errors into code, not the database. It doesn't sound like we have a very strong reason to separate the models with this.

agjohnson commented 3 years ago

Removing this from new template project, this is mostly a backend change to start. We can discuss whether to surface this in the UI immediately or not, but I don't think we need to do too much on the front end side to accomplish this at the moment. The implementation in the new templates might look a little different however.

agjohnson commented 2 years ago

I'd vote against a separate table, it probably isn't necessary

My mind has maybe changed on this. If we want multiple warnings, we probably do need additional modeling like @humitos suggested previously.

At very least, we need to be able to store an array of errors/warnings ids emit from a build. We probably still want to maintain the error/warning text inside application classes however, leveraging localization we already have and use.

So BuildError model seems reasonable. But a simple lookup choice field between error_id and the error message class would be enough.

humitos commented 2 years ago

Today I saw this message on CircleCI and I liked it:

Screenshot_2022-03-15_10-52-06

The link goes to https://discuss.circleci.com/t/legacy-convenience-image-deprecation/41034

agjohnson commented 2 years ago

One more thought here on technical implementation. It might be nice to use some of these error states on the build listing page. In template logic, some of the conditionals I might want to use are:

Does the build have a concurrency limited error?
Does the build have an error we caused, or application error?
Does the build have a warning? If so show a warning icon with some details
Did the build time out? Show a special state
Was the build cancelled? Show a special state

Implementation might depend a bit on this use case, as I'll want to have an easy to maintain way of using template error classes in template logic.

Also, because this will be on the listing page, it could have negative impacts on build query time. Perhaps this could make more sense as a JSON field.

humitos commented 2 years ago

It might be nice to use some of these error states on the build listing page. In template logic, some of the conditionals I might want to use are:

I think we can handle all these cases by expanding the number of state= we have and using Build.errors + Build.warnings to accumulate user-facing messages. I described a potential implementation of this idea in #9100

agjohnson commented 2 years ago

I think we can handle all these cases by expanding the number of state=

Hah maybe this is already getting confusing. Do you mean expanding the number of Build.status?

I'd still consider Build.state should be something like:

Build.state in [
    Build.QUEUED,
    Build.CLONING,
    Build.INSTALLING,
    Build.BUILDING,
    Build.UPLOADING,
    Build.COMPLETE,
]

And Build.status is what you are describing. Correct?

As part of #9100, I broke out the larger discussion to #9110. There are a few questions there that relate to this issue. If we can come to some consensus there, we can figure out if this issue is still important, or if it becomes merged into a discussion about Build.status instead.

humitos commented 2 years ago

@agjohnson

Do you mean expanding the number of Build.status?

No. I'm saying that we can expand Build.state to communicate not just the state, but also the reason. A good example is Canceled state. That means the build is COMPLETE (in your description) but also we know the reason as well: the user canceled it. That way, we don't require multiple fields to communicate this.

agjohnson commented 2 years ago

Okay so then I'd probably disagree there, this is the pattern I'm trying to get away from because the template and JS logic are far more confusing with this pattern.

If we separate the two concepts:

Build.state allows me to represent which step the build is on. Templates and JS use this as an enumeratino to describe the sequence of events.
Build.status or Build.messages allows me to display warnings, errors, and metadata on what happened. This is purely informational and doesn't affect the build sequence.

If we combine the two concepts, we'll just need logic to unravel the combination field back to a list of steps, so that templates can describe the steps sequentially again.

Reducing the number of fields doesn't really seem like a strong benefit, especially if we just need helpers to unravel this back to usable forms.

Is there some strong benefit combining the two provides?

humitos commented 2 years ago

In my proposal, it's easier for me to understand that Canceled state always means:

completed / finished
failed
canceled by the user

(extra messges are at Build.messages = [NoConfigFileTipMessage], for example)

and there is no other way to read the Canceled state.

In your proposal, to express the same we need to make sure we are setting 3 fields:

success=False
state=Completed
Build.messages = [NoConfigFileTipMessage, CanceledByUser]

IMHO, this could be more flexible but also more error and inconsistent data prone. I'm sure we will end up with a combination of those fields that should not be possible (e.g. success=True and Build.messages=[CanceledByUser] because we already have similar cases.

agjohnson commented 2 years ago

Build.status is certainly a bit redundant with some values of Build.state -- do we even plan on using Build.status unless Build.state == "finished"?

I feel like what you are describing is the need for code level constructs though, not a data construct. To me, combining the fields into a singular field mostly feels mostly like a lateral move to abstract the storage of this data. We still need to break it apart when we need to use the data or query for one half of this combined field.

With code level constructs, we can make queries and comparisons more descriptive, as well as contain logic in a single place. This avoids taking on work that will be costly and highly prone to pain, but gives us much more usable helpers for build state.

Psome pseudo python:

class BuildCancelled:
    build_state = Build.STATE_ALL  # or mock.ANY like object
    build_status = [Build.STATUS_CANCELLED]

# Query with mgr method
Build.objects.is_state(Cancelled)

# Compare object
if build == BuildCancelled or build.is_state(BuildCancelled):
    pass
# Avoid impossible scenarios by consolidating logic
build.set_state(BuildCancelled)

In your proposal, to express the same we need to make sure we are setting 3 fields:

I don't feel strongly here, but I do suppose success is performing a legitimate function right now. I was considering this field as a candidate for moving to Build.status though. We still need a Build.success method in our templates and API, as both need easy access to Build.success.

I think where I'm leaning is:

Build.state still describes where we are. We update the verbiage here eventually though
Build.status remains a lookup field, describes the single most important reason for success/failure. I do have questions here though.
Build.messages is a new, JSON field for messages, warning, errors
Build.success is a method or read-only property, uses Build.status.

Still status and messages have overlap to me. I'd need to talk through this, it's what's blocking me. Some statuses are obvious for Build.status -- duplicated, cancelled, retrying, success, failed. Others straddle Build.status and Build.messages -- timed out, oom kill, etc.

Anyways, the cost to this is very low in comparison, as we aren't altering a field. I'd only need to worry about adding conditional logic to templates/JS between the display of the old error format and the new error format.

This is considerable work already, and so altering Build more has me even more concerned about downstream affects this change would have on templates and JS.

I'm sure we will end up with a combination of those fields that should not be possible

With some code level additions, we can help avoid this. But also, it doesn't really seem like Build.status is used unless the build is finished, similar to other fields already. I'd say this isn't a strong concern for me, not enough to warrant taking on a Build migration at least.

humitos commented 2 years ago

@agjohnson

I feel like what you are describing is the need for code level constructs though, not a data construct

I'm not describing something like that. I don't want to depend on those code helpers to keep consistency in data. In particular, when we can have the same without requiring extra code and just adding some extra states. I'm just suggesting adding more "final states" instead of "having only one (FInished) + a list of possible options to describe that finished state". That is, instead of Finished + Cancelled By user, just Cancelled by user.

I made a drawing to explain myself better:

all the names in the drawing are states
final states are marked with a circle with an X inside them
initial state is marked with a tick
there are just one final state where the build finished without errors
there are multiple final states where the build has failed
there are some transit states where the build is not success or failed yet
each final state will be associated with its error message
warning/info/suggestion messages (.messages) are independent of build states (e.g. an info message about using a config file could be on builds that are on Success state but also on Timeout state)

(I'm suggesting using .messages as a way to communicate anything we want to the user on the build detail's page. This could be error messages like " 🔴 there was a problem with your dependencies", warning messages "⚠️ pdf generation failed, but we uploaded HTML" or info messages "ℹ️ we strongly recommend you to use a config file")

Build.success is a method or read-only property, uses Build.status.

If this is a method, how do we query at db level for successful builds?

agjohnson commented 2 years ago

I think there are some useful points discussed, and parts of your implementation are what I would want if we were starting from scratch, but this refactor seems like mostly a lateral move due to the work that would actually be required. That, and it doesn't seem like this is solving a pervasive issue, but if it is it seems like the issue mostly affects core team. I still think code additions to help consolidate logic are our best option here. Core team can see some benefit without negatively affecting existing code, and users won't have data or APIs changed on them.

The error message refactor is the more impactful of the two refactors though, so we should resolve what we need to unblock that and perhaps focus on just that for now.

What you are describing is seeming like a combination field: Build.state + Build.status + some subset of error messages. The addition of build errors in this combination field feels at odds with Build.messages.

Build.messages is replacing/deprecating Build.error, as we wanted a place to record build error messages with context data. So it's a good fit for configuration errors like you note. But Build.messages is also a better fit for command failures and unknown errors, as both will always have some context data attached. We can't represent context data like this in Build.state.

But then it also feels like bad design to put some error messages in Build.messages and some errors descriptors as build state. Consolidating warnings and errors in Build.messages seems like a wiser design.

Work

The refactor to Build.messages is already large, so I'd like to keep scope down:

We still need to support an old method of displaying error messages in templates (the existing implementation is mostly nested template logic), along with a new template pattern using Build.messages.
To add to that, we have old build data that doesn't use Build.commands at all. This is yet another template version for the build detail page. This data should probably be deprecated and replaced with "No build detail view, check our API for data".
We need a technical implementation for storage and raising errors. On the front end, I mostly just want to iterate over a list of sorted messages (so we need message/class priority probably). I'd like to link to output line numbers where applicable too.
We need to decide what is important as a message vs status.

There are several statuses that sit somewhere between Build.message and Build.status. This is how I would group them:

Build(state=CANCELLED, status=DUPLICATED)
Build(state=CANCELLED, status=BY_USER)
Build(state=CANCELLED, status=AUTOMATICALLY)
Build(state=QUEUED, status=RETRYING)
and perhaps more we could expand here, if we want to be explicit

I would do this because on the build listing pages, I don't want granular information like "cancelled by user" vs "cancelled automatically". I'd just show the build as cancelled on the listing page. I would show by user, automatic, and duplicated cancellations on the detail page. This data is used by both templates (and so django model attrs/methods) and via our API (where I need to replicate chunks of the Build model in JS).

agjohnson commented 1 year ago

I removed assignment, this is a very backend change.

I'd still like to see this happen. I keep finding places where I want messages emit from the build process, instead of from template logic like:

https://github.com/readthedocs/readthedocs.org/blob/68fd63bc923fddbf821195a32427c0f1af390098/readthedocs/templates/builds/build_detail.html#L152-L187

Not a huge deal to bring this pattern into the dashboard, but I'd like to start wrangling our error/info message patterns at some point.

agjohnson commented 1 year ago

Note, I added the following issue and discussion:

Given this issue has gone in a few directions, it's probably worth moving the remaining discussion to the newer issue. I think my original description is still close to what we're talking about ultimately though.

humitos commented 1 year ago

I read all this conversation again and I'm considering the ideas talked here. We will be able to attach multiple notifications to the same build (error, warning, note and tips). This means we don't need the field Build.status since all of them will be a notification on the build itself.

Besides, each notification will have an ID to a particular message that will support HTML content as described here.

readthedocs / readthedocs.org

Improve build error/warning suggestions #3399

Work