tajmone / PBCodeArcProto

PB CodeArchiv Rebirth Indexer Prototype
4 stars 0 forks source link

Errors & Warnings #8

Open tajmone opened 6 years ago

tajmone commented 6 years ago

The HTML generator app must handle both Wanings and Errors.

WARNINGS — any problem encountered with project files and settings (metadata) that doesn't prevent the correct execution of the task but is a sign that some expected elements are missing or malformed.

Warnings are reported but don't halt the program execution.

ERRORS — any problem encountered that either:

  1. prevents completion of a task
  2. indicates that the final result will be malformed

Any error will force abortion of program execution.

Pre-Run Project-Integrity Check

When the App starts, it firsts carries out a Project-Integrity Check, allowing to catch some potential problems before actually building anything:

If any of the above conditions are violated, the user is warned about them and asked for cofirmation if he wants to carry on — except for Errors which are marked as requiring ABORT. If he does carry on, the above Errors will be treated as warnings by the App further on — the assumption being that if the user chose to ignore them is because he might want/need to proceed for testing reasons.

This pre-check stage can only check some generic potential errors ahead, but can't catch more detailed problems regarding the actual parsing of resources.

Run-Time Problems Handling

During the actual processing of categories and resources, all encountered problems are handled as either Errors or Warning.

The following is a list of potential problems, they categorization and implementation status:

After the App has finished executing, a report is printed out with statistics and a list of all the Warnings and Errors and their details — the report is printed even if the app aborts prematurely, so the user is informed of as many Warnings as possible.


In some cases, it's up to us to decide which problems should be treated as Warnings or Errors — the latter will force the maintainer to fix the issue in order to rebuild the HTML pages.

What I'm trying to achieve here is a compromise between strictness and flexibility:

Strictness: as far as updating the website pages is concerned, the App should warn the user about problems which might make the website display wrongly. The Pre-Run Project-Integrity Check is aimed at catching some of these problems as early as possible, thus preventing the time waste of having to process half of the project before it aborts due to error.

Flexibility: at the same time, in some cases the user might be testing some major changes and thus wish to build the whole project even though it's not correctly represented, for the sake of testing. In these cases, the user should be allowed to choose to ignore Error and Warnings and build the project nonetheless. For this reasons, all Errors which dont' stricly prevent the App from carrying on execution, should ask the user for confirmation before aborting.

The balance is between preventing time waste when trying to fix issues (and having to rebuild over and over again), and allowing freedom to go through the whole process regardless of problems (to check how the output looks like after changes).

tajmone commented 6 years ago

Deciding which problems should be warnings or errors really depends on the rules we set for the project's well-formdness.

For example, should a category without resources be considered an error? or should we just issue a warning to point it out? The root category is always empty, as it contains only subcategories. Is it ok is there are other categories with no resources (neither single file sources or multi-file foldered items)? Or should it be ok only if the category acts as a container for subcategories?

README files are not strictly required: the generator could handle a missing README.md by autogenerating a header 1 title with the category folder name, and then append the subcategories links (if any) and the resume cards. But probably it's better to consider this as an error, as every category should have some description of sorts. Also, if we're going to use YAML headers in README files for metadata, a missing README should alt execution and force the maintainers to add one.

What about resources which yeld no key-vals in comments parsing? should this be an error, forcing the maintainer to fix it (ie: we assume that he forgot to add the special comment markers to some newly imported code), or should we just skip the whole resume card and issue a warning?

As you can see, it all boils down to what we'd like to be the minimum requirements for every category and resource, to where we draw the line to prevent building a bad website and force maintainers to fix problems.

Could you provide some examples of what you'd consider errors and warnings, of what additional checks might be required during execution?

SicroAtGit commented 6 years ago

For example, should a category without resources be considered an error? or should we just issue a warning to point it out? The root category is always empty, as it contains only subcategories. Is it ok is there are other categories with no resources (neither single file sources or multi-file foldered items)? Or should it be ok only if the category acts as a container for subcategories?

If the category directory is completely empty (contains no code file and no container directory for multiple code files), the category directory should be ignored. In this case, there is nothing to indicate an error, but a warning should be issued as a precaution.

But probably it's better to consider this as an error, as every category should have some description of sorts. Also, if we're going to use YAML headers in README files for metadata, a missing README should alt execution and force the maintainers to add one.

I agree.

What about resources which yeld no key-vals in comments parsing? should this be an error, forcing the maintainer to fix it (ie: we assume that he forgot to add the special comment markers to some newly imported code), or should we just skip the whole resume card and issue a warning?

This is the work of my CodeChecker, which I will reintegrate soon. In the end, there will be a code that does all the work so that the repository administrators doesn't have to run multiple dev tools. At least that's my plan so far.

Could you provide some examples of what you'd consider errors and warnings, of what additional checks might be required during execution?

I think, an error should always be reported, except in the case mentioned above. If the pandoc version is newer than the tested version, a warning should be issued, so that the contributor himself should have a look at the result of the dev-tools to be on the safe side.

tajmone commented 6 years ago

Ok, these made the context a bit clearer.

As you can see from the changes in the first post above, I'm now trying to implement a dual strategy through the Pre-Run Project-Integrity Check, which should catch all major problems before starting to process the project, but then the user should be given the choice to continue if he wants (in which cases those Errors are treated as Warnings).

I've taken this route becuase while working on some features I realized that there are times when a maintainer might wish to disregard all warnings and just go ahead (because focusing on a single problem at a time).

Aborting the execution should be done in order to prevent going ahead when it's just a waste of time because there are problems that we already know will prevent the final HTML to be publishable. But in all cases, a choice to carry on should be given for testing purposes.

tajmone commented 6 years ago

Modularize Code?

You mentioned:

This is the work of my CodeChecker, which I will reintegrate soon. In the end, there will be a code that does all the work so that the repository administrators doesn't have to run multiple dev tools. At least that's my plan so far.

I've been trying to make the App a single source-file, but I was wondering is some parts of the code should be modularized so you can reuse them elsewhere — for example, the comments parser, the project-integrity checker, etc.

Originally I was planning to work with include files or even modules, but then I thought that a single source is easier to modify (renaming identifiers, etc) and would avoid some overhead of using modules. But if there are plans to have some other tools to check integrity or perform some other actions on the project, it might be worthy to split the code in modules with a decent API that would allow changes to the module to work with all tools that depend on them.

SicroAtGit commented 6 years ago

If several programmers write on one tool, it is certainly better if not all write in one code, because each programmer has a different programming style. Therefore I think that we should definitely use IncludeFiles.

We can use modules within the codes (they automatically produce a structured code), but reusable procedures are also sufficient.

I had two tools in the old repository for checking and adjusting the codes:

However, these tools are certainly outdated and no longer support the structure of the new repository. The codes must therefore be rewritten in any case.

In the next few days I will create a list of all the tasks that need to be done automatically before your codes are executed. Then we can discuss how to resolve this. The decision will then be easier for us because we have a complete overview of all tasks.

tajmone commented 6 years ago

Ok, then definitely the code for comments parser should be put in an external file as it might be reused by other project maintainance tools too. The only problem is that it's unlikely that we'll get a fully standalone includefile/module because it will depend on some vars and structures defined in main code — so, any other code using it would have to replicate those too.

But this should be fine I guess, after all these are intended only for apps dealing with the repo (as opposed to being reusable code published in the project for endusers).

SicroAtGit commented 6 years ago

So far I haven't had enough time for a detailed task list. The sun seduces me to leave the PC :-)

The modularity doesn't have to be very strict. As I said: At the end there should only be one tool (a main code) that should be executed when new code is added to the CodeArchive. It is sufficient if we define structures and variables that are required in the main code and in the include files only in the main code. We should avoid global variables as far as possible, unless there is a good reason for this.

The CodeCleaner, which I had in the old CodeArchive, has removed the PB settings at the end of the codes.

The CodeChecker, which I had in the old CodeArchive, checked the code headers and performed a syntax check on the code using the PB compiler.

It should also be ensured that all files end with a blank line, because Git and many Linux programs complain that such a line is missing or results in incorrect processing.

If you want, you can completely rewrite these codes, which must be done in any case. I wanted to do this task myself, because you already have enough work with the tool that creates the website.

The CodeCleaner and the CodeChecker each become an include file. Both include files require access to the error/warning output function and to the list containing the paths to the code files, to the code info files and to the license files.

tajmone commented 6 years ago

Ok, I'll then have a closer look at their code to get a clearer idea if something can be resued from them, and viceversa if some parts of the current code I'm writing can be shared by them.

If I understand correctly, these other two tools performed some checks on the integrity of the resource files (header comments and other prerequisites); so the idea is that these other tools should be used when adding new resources, to check they meet the requirements, and if they do then the HTML pages can be rebuilt to include the new resources.

Should these tools be separately run, or should it just become a single tool that does all the checks and HTML rebuilding together?

tajmone commented 6 years ago

PS: The sun has popped out here too, so I'm also getting carried away from the PC to enjoy it (after all those rainy weeks).

SicroAtGit commented 6 years ago

If I understand correctly, these other two tools performed some checks on the integrity of the resource files (header comments and other prerequisites); so the idea is that these other tools should be used when adding new resources, to check they meet the requirements, and if they do then the HTML pages can be rebuilt to include the new resources.

Exactly.

Should these tools be separately run, or should it just become a single tool that does all the checks and HTML rebuilding together?

Only one tool that does everything, as I tried to explain above.

However, not everything should be in one large code, but parts of the code should be stored in include files. I think this is better if the tool is written by several programmers to avoid conflicts by not having too many programmers writing on the same code file. And as I mentioned above: Every programmer has his own programming style. Different programming styles in a large code I don't think looks good. That's my opinion, but I don't have much experience programming in a team.

tajmone commented 6 years ago

Ok, now I have a clearer picture. This should be a problem at all, after all it's just a matter of making sure that the HTML pages creator doesn't run unless all other tests on the resources passed.

tajmone commented 6 years ago

In the last few days I've spent some time going over the various Issues, the proposals, answers and discussion. I've been trying to work out how to to accomodate all the various requirements of project checks into a single app, and trying to weigh out the pros and cons.

On the one hand, having all checks done by the page builder is the ideal solution, but it might become slower (as mentioned elsewere). I've been thinking of how to create a GUI to handle settings, thus allowing for example to skip certain checks, etc.

But the more I think about this questio and more I come to the conclusion that implementing a cache is the only clean solution to the problem. This would allow all tests to be run every time, without perfomance losses. It would also allow dry-running pandoc conversions (ie: writing output to a temporary file before actually writing it to the repository) so the whole build process could be fully tested before commiting anything to disk.

I'm thus evaluating the benefits of starting to work on a cache system right now, instead of post-poning it. Without a cache, I might end up having to write lots of code to workaround speed limitations — most of which would be discarded after a cache is in place. So it might be worthy considering using a cache right now.

A cache system would also allow to reduce the options for running the app, which makes it easier to use — basically, options would end up dealing mainly with verbosity and debugging log.

I'll start doing some research and tests on how to implement an efficient but simple cache system, and then be back to you on the issue.

In the meanwhile, if you have any suggestions on the topic...