luser-dr00g / xpost

A PostScript interpreter in C
Other
93 stars 12 forks source link

Completion of error handling #47

Open elfring opened 4 years ago

elfring commented 4 years ago

Would you like to add more error handling for return values from functions like the following?

luser-dr00g commented 2 years ago

Sorry for the delay. Yes, this is a good idea. The changes in xpost_file.c are pretty straightforward. I'm less concerned about that call to malloc() just because it's part of old unit testing code that I think isn't even used anywhere anymore.

That strdup() call is a fair cop. I'm looking into it further.

luser-dr00g commented 2 years ago

strdup is used in a number of different places. And all of them deserve better error handling. Even though it may make the code look more complicated.

$ grep strdup src/bin/*.c
src/bin/xpost_main.c:                defs[num_defs-1] = strdup(define);
src/bin/xpost_main.c:        char *devstr = strdup(device);
$ grep strdup src/lib/*.c
src/lib/xpost_dsc_parse.c:        array[i + count1] = strdup(array2[i]);
src/lib/xpost_dsc_parse.c:        dsc->header.for_whom = strdup(dsc->header.creator);
src/lib/xpost_font.c:    filename = strdup(file);
src/lib/xpost_interpreter.c:    devstr = strdup(device); /*  Parse device string for mode selector "dev:mode" */

The handling of the defs array in main() uses realloc in a similarly cavalier fashion. The uses with device strings are to make a scratch version to then butcher with strtok.

So what to do in case of NULL. For functions near the top level in xpost_main or xpost_interpreter, they could just bail and return failure up the chain. Each caller up the chain will need to propagate the error. I haven't looked at what's going in the font module, but presumably the code is inside an operator or a helper for an operator so the error can be reported there. xpost_dsc_parse was written in preparation for a document viewer client in the spirit of ghostview or GSView. But I think it's not called at the moment, so lower priority. But it also ought to propagate its errors (in time).

There may be a duplication of effort in xpost_main and xpost_interpreter. Both calling strdup(device) and probably both parsing the string for the :mode. So maybe xpost_main should just carve up the original and be done with it and pass both halves separately to xpost_interpreter.

elfring commented 2 years ago

How do you think about to improve static source code analysis also for this software? :thinking:

vtorri commented 2 years ago

we can use coverity. I have found this : https://github.com/FFmpeg/FFmpeg-Coverity/blob/master/.github/workflows/run-coverity.yml

there is also sonarqube but i've never used it. If you know how to integrate it in github actions, more precisely in https://github.com/luser-dr00g/xpost/blob/master/.github/workflows/c-cpp.yml , i'll be glad to add it

elfring commented 2 years ago
vtorri commented 2 years ago

@elfring i let @luser-dr00g answer these questions

luser-dr00g commented 2 years ago

@elfring As a language interpreter a lot of the error handling is dictated by the language, here PostScript of course. So most of the code in the project is C functions that implement the behavior of PostScript operator objects. And these have a defined mechanism that can be modified by the user.

The rest of the program, what I've been calling "the top level" is more like a regular C program and is likely to benefit from any new techniques or best practices that can be applied to C programs.

So, let me read up a little on aspect-oriented stuff (thanks for the link) and I'll say more. But more generally I'll say that throwing more learning and knowledge at this thing seems like a good thing to do.

luser-dr00g commented 2 years ago

The last time I tried running a static analyzer over the code base (some years ago) it certainly did have some surprising discoveries of interactions between modules that I didn't know about, even having designed and coded (and laboriously debugged them).

luser-dr00g commented 2 years ago

AFAICT aspect-oriented programming is a strategy and toolset for imposing an extra layer of discipline in an object oriented language or framework. There are various parts of the program that apply an object oriented approach to data structuring. I think in all cases, these just model a pretty simple interface inheritance.

An Xpost_File object for example could be just a FILE with a bunch of functions that just call stdio.h functions, or it could be a counted string pretending to be a file. This is implemented in C as a struct with a vtable struct embedded at the start. It has functions to allocate and initialize for each flavor. But this is all hidden behind a suite of interface functions that are called instead of stdio.h functions but mirror their arguments and return values (if it's a FILE of course then it does ultimately call the stdio.h function through the function pointer in the vtable). I'm not sure applying a new strategy will make this code more applicable elsewhere. (Not saying it can't just that I don't see right now.)

The other area is output devices. These are implemented as PostScript dictionary objects. Some of the keys in the dictionary are for data object and some are for methods. The base class implements these methods for dealing with a PPM image, just about the simplest standard color image format that is still widely compatible with other software. Then most of the other devices create an object by instantiating an object of the base class and redefining its methods to be PostScript operator objects which have associated C code to execute.

So that's a sort of inheritance. And it's admittedly all cobbled together out of toothpicks and baling wire. But the things I want to do with it are more in the direction of making a dll interface so devices can be hot-swap-able. It may be that aspect-oriented is tackling a different territory of programming than the needs of this program, in its current state.

But that's not a final verdict or anything, just some relevant thoughts I was having.

elfring commented 2 years ago

It may be that aspect-oriented is tackling a different territory of programming …

:crystal_ball: How will the development attention evolve further for the structured handling of cross-cutting concerns?

luser-dr00g commented 2 years ago

How will the development attention evolve further for the structured handling of cross-cutting concerns?

That's a good question. Let me tackle the pieces from that link one by one and then see where that leads.

Business rules

I think we can ignore this. As a programming language interpreter, business rules would be the purview of the application code and the responsibility of the author of that code. Our task is just to correctly implement the specification, and to document the level of compliance achieved.

Caching

I have difficulty imagining how we could address caching in a structured way that can be reused anywhere in the application. It may be a deficiency or peculiarity of the C language since it doesn't offer much in the way of syntax extension or automatic creation/destruction of objects.

Code mobility

This doesn't really exist in the PostScript ecosystem with the exception of Sun's NeWS extensions. And due to the topic of security below, there doesn't appear to be any impulse in the community to move in this direction. There's been some interest in making a user archive somewhat like the TeX community has, but that would be just for downloading code to run locally and the code in question would not need to also run on a server or some other environment. So I think code mobility isn't a real concern that we have in this project. But if all the NeWS stuff ever gets finished, then code mobility could become a concern but I think only at the PS language level. I think we still wouldn't really need to worry about it at the level of the C implementation.

Data validation

I think there are very few places where this applies, but I could be wrong. There are a very few magic numbers that arise in the code in places. AFAIK these all have sources in the standard literature (literally, the PLRM standard and the books listed in its bibliography). But it might be worthwhile to take a pass through the code to make sure these are all suitably commented and documented.

Domain-specific optimizations

I'm interpreting this as "platform-specific" optimizations. We largely have platform differences isolated to one "compatibility" module, and to the device modules which are compiled and included if available. For example, there are two possible Windows devices depending upon the availability of OpenGL.

Environment variable

These are exposed at the PostScript level with a language extension consisting of getenv and putenv operators. At the C level, we just use the C interface (which the operator functions also use).

Error detection and correction

There are 2 sort of "domains" of errors that are handled somewhat differently.

Like I described earlier in this thread an error at the PostScript level triggers the stand PS error mechanism which involves executing an error procedure from errordict. The standard error procedures save snapshots of the stacks and then call stop which rewinds the execution stack back to an outer level "catch" routine (it's not actually called "catch" but that's kinda what it does) and this routine calls handleerror which prints an error report to stdout. There are lots of places in this mechanism where the user can change the behavior. The handleerror procedure can be changed to print the error report differently or to a different place. The error procedures in errordict can be replaced with new ones that do something different. So I think we have this part well covered by the design of the PostScript language.

The other domain is the straight C code. And there's probably a lot of room for improvement here. Especially in the code I've written. I think I've not been very consistent across the whole project. In some areas, I favor early returns. In some areas there's a goto cleanup; which returns some error indication. AFAIK the best practice in C is to pick a way that works and then be consistent about it.

Internationalization and localization which includes Language localisation

I think being a programming language interpreter saves us from having to worry about this. For better or worse, in the domain of programming English really is the lingua franca. Perhaps the prose parts of the documentation could be translated if that would be useful to someone, but I don't forsee a need to change any other strings in the program as they mostly come directly from the language standard.

[The next topic is a big one, so I'll stop here and do some more thinking...]

luser-dr00g commented 2 years ago

Caching

I have difficulty imagining how we could address caching in a structured way that can be reused anywhere in the application. It may be a deficiency or peculiarity of the C language since it doesn't offer much in the way of syntax extension or automatic creation/destruction of objects.

A little more on caching. This does happen in the code. Particularly in the code for drawing operators that get called a lot and need to be fast. Objects are created by a call to a constructor function. So if a drawing operator needs to use or return some objects, the constructor calls can be moved to the initialization function for that module to initialize static variables and then use or return those variables in the operator. This probably isn't worthwhile for very simple objects like integers and reals and booleans, but the constructor can be costly for name objects because the constructor uses a ternary search tree to return a unique integer for each string.

So, we do have a technique for this kind of caching which can be applied across the application (every module has at least one initialization function that can be used to initialize static variables). Perhaps the technique isn't as "structured" as it could be.

[On another note, we might be able to speed up drawing by not following all of the rules I made up for devices. What if FillPoly() didn't call DrawLine() and PutPixel(). The device couldn't be used to capture that data by overriding those methods, but so what? It would certain draw faster without doing all that stuff.]

elfring commented 2 years ago

The other domain is the straight C code.

I imagine that mentioned programming languages can generally benefit more from aspect-oriented software technology. The corresponding support is evolving as usual, isn't it?

And there's probably a lot of room for improvement here.

How do you think about to let development tools search for the source code places where common actions (like error detection and exception handling) should be performed in well-known ways (instead of manually specifying the repetition for selected data processing steps)? :thinking:

luser-dr00g commented 2 years ago

How do you think about to let development tools search for the source code places where common actions (like error detection and exception handling) should be performed in well-known ways (instead of manually specifying the repetition for selected data processing steps)? thinking

That's brilliant! Searching for nitty gritty details among thousands of lines of code is a horrible task for a person to do. Tedious, boring, uninspiring, and human attention doesn't work at the necessary level for prolonged periods.

I love it. Let's do it.

vtorri commented 2 years ago

add fuzz stuff

elfring commented 2 years ago

Searching for nitty gritty details among thousands of lines of code is a horrible task for a person to do.