Open ytausch opened 8 months ago
My only comment is that the bot having a bad schema is not necessarily a user issue, so that putting it on the status page is confusing.
Maybe we can add a note to the status page explaining exactly this? I think it is still justifiable to put it on the status page since something happening outside the documented behavior is in some sense a degradation. Still open to your concerns, of course.
Yes I think that'd be ok. To make this happen, we'll need to update the status_report.py file to write the schema violations to a json blob in cf-graph-countyfair. They do not need to be pushed into mongo. Then we can update the status page to pull them and display them.
@beckermr You seemed to have introduced a PR status check for the model with https://github.com/regro/cf-scripts/pull/2702. My intent was originally to run the check only periodically because a failed model validation is most likely not something that is wrong with a PR. Is there any deeper reason why you did the change?
If not, I propose to only have the periodic check again.
It is cheap to run and turns out to be useful for debugging. I'd prefer to keep it. The check is not required.
But can't you debug with the periodic check just as good enough? I know the check is not required but it makes merging PRs with red crosses the norm. Ignoring checks (although they are not required) is not something that one should get used to.
With PR #2239, the bot will receive a Pydantic model for its internal data schema, which is validated periodically in a CI job against the entire conda-forge dependency graph.
The future path forward is surely to use the Pydantic model in the bot's production code and not only for documentation purposes. The parts of the model that are directly inferred from a feedstock (e.g.
conda-forge.yml
) should be validated by conda-smithy such that cf-scripts never receives an invalid feedstock to update.This means cf-scripts could, in the future, use a strict Pydantic schema to work with and just fail fast if anything violates it.
However, we are not there yet since migrating the existing codebase to Pydantic is some non-trivial amount of work. So we need an intermediate solution that should satisfy two goals:
conda-forge.yml
file).The current idea is to add a new section ("Bot Schema Validation") to the conda-forge status page that classifies each package into 3 categories (names subject to change):
good
,bad
andknown bad
. Packages aregood
if their cf-graph files are good (implying that the important fields ofmeta.yaml
andconda-forge.yml
are also good),known bad
if the bot schema is violated because of invalid data in the feedstock (and not an outdated bot model), andbad
if it violates the bot schema (which should be investigated). Theknown bad
list is already part of #2239.This would allow us to track individual packages as well as the currentness of the Pydantic model.
I am aware of https://github.com/conda-forge/conda-forge.github.io/pull/2090 being about to merge. Of course, any additions to the status page will build upon that.
What do you think about this?