moov-io / iso8583

A golang implementation to marshal and unmarshal iso8583 message.
https://moov.io
Apache License 2.0
304 stars 100 forks source link

avoid using panic in the code #269

Closed alovak closed 10 months ago

alovak commented 10 months ago

There are some areas in the module where we use panic. However, it's not best practice to employ panic within a library. It would be beneficial to explore alternatives for eliminating its use while maintaining as much backward compatibility as possible.

adamdecaf commented 10 months ago

We can return errors where there weren't some before. It sorta breaks the API, but in a fairly clean way that people can check. I hope no one was relying on panic behavior.

SahibYar commented 10 months ago

@alovak we can start adding defer functions with recover calls in functions which are calling panic to maintain backward compatibility for now and add comment that this function will be Deprecated in coming releases instead use new function returning error or utils.NewSafeError.

defer func() {
    if r := recover(); r != nil {
    fmt.Println("Recovered error:", r)
   }
}()

Or the other way is start writing functions straight away as @adamdecaf mentioned in their comment.

Please suggest the acceptable approach, and I will raise a PR or may be extend @PumpkinSeed work.

PumpkinSeed commented 10 months ago

@SahibYar I have a question if you don't mind, because I don't understand certain parts of that. I quickly created a showcase based on your recover suggestion, please update the code since I'm sure it's not matching your expectations.

I assume someone relies on the panic's behavior by using recover function. If we catch the panic before it hits the user of the library's recover, how it supposed to solve the backward compatibility? Probably it's my misconception about the solution, because I assumed that the users of the library are not using the sort package directly.

Also you can drop my PR, since it's only implements the function signature change. The harder part (what you need to take care of in this case) will be to discover where the caller functions are located to implement the defer recover logic.

alovak commented 10 months ago

Hey! What I see from the code is that panic is invoked when we validate field.Spec or MessageSpec. It happens when we create a new field/message using the constructor. You can see here that, in the case of Composite field, it creates all subfields and calls SetSpec for each of them. Note that some of these might also be composite fields, making the process recursive.

I don't see how it's possible to return errors from the constructors using the following form of the spec definition:

    spec := &MessageSpec{
        Fields: map[int]field.Field{
            0: field.NewString(&field.Spec{
                Length:      4,
                Description: "Message Type Indicator",
                Enc:         encoding.ASCII,
                Pref:        prefix.ASCII.Fixed,
            }),
            1: field.NewBitmap(&field.Spec{
                Description: "Bitmap",
                Enc:         encoding.BytesToASCIIHex,
                Pref:        prefix.Hex.Fixed,
            }),
            // ...
            3: field.NewComposite(&field.Spec{
                Length:      6,
                Description: "Processing Code",
                Pref:        prefix.ASCII.Fixed,
                Tag: &field.TagSpec{
                    Sort: sort.StringsByInt,
                },
                Subfields: map[string]field.Field{
                    "1": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "Transaction Type",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                    "2": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "From Account",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                    "3": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "To Account",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                },
            }),

If there are no some kind of ways to overcome it, I would suggest removing panic from the sort package only (by using default behavior when there is an error) and keeping it as is for the SetSpec.

SahibYar commented 10 months ago

I kindly suggest that we consider refraining from using such coding style for initializing fields/composite fields with maps and their values, and recursive maps, Given that ISO-8583 is a strict protocol, may I propose that we utilize the Builder Pattern / Stepwise Builder Pattern for constructing fields/messages with strict concrete types for building fields/message just like we have already defined for Track1 , Track2 and Track3

We can take motivation from https://pkg.go.dev/regexp#Compile which throws error if regex is not parsable and https://pkg.go.dev/regexp#MustCompile causes panic ensuring safe initialization if regex is not parsable.

As the author of library we can use MustXXX idiom, that way the user of library is aware of a possible panic but that is only possible if we can re-write constructor with Builder Pattern or Step-wise Builder Pattern

We are already using .MustCompile in our code, so the program will still panic if we remove all explicit panic from our code.

alovak commented 10 months ago

@SahibYar this is how we define the spec today:

    spec := &MessageSpec{
        Fields: map[int]field.Field{
            0: field.NewString(&field.Spec{
                Length:      4,
                Description: "Message Type Indicator",
                Enc:         encoding.ASCII,
                Pref:        prefix.ASCII.Fixed,
            }),
            1: field.NewBitmap(&field.Spec{
                Description: "Bitmap",
                Enc:         encoding.BytesToASCIIHex,
                Pref:        prefix.Hex.Fixed,
            }),
            // ...
            3: field.NewComposite(&field.Spec{
                Length:      6,
                Description: "Processing Code",
                Pref:        prefix.ASCII.Fixed,
                Tag: &field.TagSpec{
                    Sort: sort.StringsByInt,
                },
                Subfields: map[string]field.Field{
                    "1": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "Transaction Type",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                    "2": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "From Account",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                    "3": field.NewString(&field.Spec{
                        Length:      2,
                        Description: "To Account",
                        Enc:         encoding.ASCII,
                        Pref:        prefix.ASCII.Fixed,
                    }),
                },
            }),

You can notice, that it calls NewComposite inside. This call may also panic.

How do you see the spec definition will look like with your suggestions?

adamdecaf commented 10 months ago

regexp.MustCompile is useful because it will throw a panic during build/test steps. Throwing panics during runtime based on conditionals is going to cause application crashes and bugs, which is why our Go linter forbids panic.

alovak commented 10 months ago

I'm thinking about the option of lazy validation when we will validate the spec not when we assign it, but before its first usage. We will validate once and cache the result. Spec is used in the methods that return error, so we will have to update some code here and there, for example in this method: https://github.com/moov-io/iso8583/blob/c20dc2cd0088176db2f6a942ea3bd1e9c1d41a33/field/composite.go#L137

func (f *Composite) SetSpec(spec *Spec) {
    if err := validateCompositeSpec(spec); err != nil {
        panic(err)
    }
    f.spec = spec

    var sortFn sort.StringSlice

    // When bitmap is defined, always order tags by int.
    if spec.Bitmap != nil {
        sortFn = sort.StringsByInt
    } else {
        sortFn = spec.Tag.Sort
    }

    f.orderedSpecFieldTags = orderedKeys(spec.Subfields, sortFn)
}

we should make orderedSpecFieldTags a method, that on the first call, will validate the spec if it's not valid. So the code like this:

func (f *Composite) packByBitmap() ([]byte, error) {
    f.Bitmap().Reset()

    var packedFields []byte

    // pack fields
    for _, id := range f.orderedSpecFieldTags {

will be changed to this:

func (f *Composite) packByBitmap() ([]byte, error) {
    f.Bitmap().Reset()

    var packedFields []byte

    orderedFieldTags, err := f.orderedSpecFieldTags()
    if err != nil {
        // this is where first validation may return error
    }

    // pack fields
    for _, id := range orderedFieldTags {

thoughts?

SahibYar commented 10 months ago

Suggested change in long term (may be)

My point was instead of making the desired MessageSpec object directly, the library user calls a constructor (or factory) with all of the required fields and gets a builder object. Then the user calls a setter-like methods on the builder object to set each optional field of interest. Finally, the user calls a paramterless Build method to generate MessageSpec object.

Example of builder pattern, Example 1 Example 2

Quick Possible Solutions.

  1. Call recover inside functions, but as @PumpkinSeed mentioned if library user is already relying on the panic behavior by using recover function and If we catch the panic before it hits the user of the library's recover, they may break it,
  2. Start writing functions with proper return errors instead of panic, and marking the current functions as Deprecated to be removed later after 2,3 releases, and suggesting library users to call new functions and handle errors instead of panics.

Thoughts on Lazy validation approach

What's the point of keeping unvalidated data in memory, and utilizing CPU cycles for executign operations, which will be reverted afterwards. I don't see any performance gain. We will still be in need to replace panics with errors.

SahibYar commented 10 months ago

I am raising a PR with point 2, to see if we can handle errors with currently defined specs as mentioned by @alovak

alovak commented 10 months ago

My concerns:

@SahibYar don't create PR with production-ready code, you can start by showing here the code example of how you will define the spec with composite fields.

Lazy validation is just an option to remove panic but keep the rest unchanged. Also, I'm pretty sure there will be no significant performance impact (we can benchmark it for sure).

PumpkinSeed commented 10 months ago

@SahibYar My thoughts on the 2. point. I'm highlighting the sort package's panics. So the hard part is not creating a new function signature for the sorting functions but propagating that error back to the caller. Because you will need to redefine a lots of functions so those can return errors. Marking all of those functions as deprecated is a huge job and it is an even bigger challenge for the user of the library. By default I like the idea to making this, but since it's a very complicated operation.

@alovak My thoughts on breaking the API. To be honest this ticket is about breaking the API, because we want to remove the panic from the code. Everybody who relying on a recover and checking for these specific panics will need to rethink the error handling.

Also I referencing the panics in the sort package. If we are validating the data before processing it (and as I get it by using it for our own specs it's validated properly) most of these panics won't even happen. So I don't think that anyone needs to properly check for them. I think it is a possible solution when we have a logger interface which calls Error() and the users can pass their own behavior into it.

alovak commented 10 months ago

@PumpkinSeed

in most cases, panic won't even happen

that's my main point. If the spec can't be built during the test or app run it will panic before someone will deploy it to production. Specification, unless created from JSON, is static. So, fighting with panic by breaking the API here is not the best idea. While we can think about possible options, I think we should fix it in sort by returning true (or false) when we can't parse the value. The rest - just ignore the linter.