isaacharrisholt / quiffen

Quiffen is a Python package for parsing QIF (Quicken Interchange Format) files.
MIT License
34 stars 30 forks source link

lenient mode #62

Closed WolfgangFahl closed 1 year ago

WolfgangFahl commented 1 year ago

Trying to import a >400.000 line QIF file there are quite a few detail errors to be expected. I already ran in the "S" problem and the next one is:

pydantic.error_wrappers.ValidationError: 1 validation error for Transaction
splits
  Split percentages cannot exceed 100% of the transaction (type=value_error)

it would be great to have a lenient mode where the parser tries to continue if if errors are present.

WolfgangFahl commented 1 year ago

the lenient mode works great for me i end up with three ignored errors in some over half a million lines of imported QIF lines. The problems seem to be minor such as having an Oth D AccountType that is non standard.

isaacharrisholt commented 1 year ago

This seems like a good idea! I think, in lenient mode, Quiffen should warn rather than error and probably store the ignored line on the object somewhere along with the error. What do you reckon?

WolfgangFahl commented 1 year ago

See ParserState class

class ParserState:
    """
    keep track of exception
    """

    def __init__(self,lenient:bool=False,debug:bool=False,num_errors_to_show:int=10):
        self.lenient=lenient
        self.debug=debug
        self.errors=[]
        self.num_errors_to_show=num_errors_to_show

    def handle_exception(self,ex):
        if self.lenient:
            self.errors.append(ex)
        else:
            raise ex

    def show_most_common_errors(self, num_errors=10):  # Default is 1, showing the most common error
        # Count error types
        error_types = [type(error).__name__ for error in self.errors]
        error_type_counts = Counter(error_types)

        # Display the most common error types
        most_common_errors = error_type_counts.most_common(num_errors)
        for error_type, count in most_common_errors:
            print(f"Error '{error_type}':{count} x.")

    def show(self):
        error_count=len(self.errors)
        if error_count>0:
            if self.debug:
                for index,error in enumerate(self.errors):
                    print(f"error {index+1:4}:{str(error)}")
                    if index<self.num_errors_to_show:
                        # Print the stack trace of the exception using traceback
                        traceback.print_tb(error.__traceback__)
            self.show_most_common_errors(num_errors=self.num_errors_to_show) 
            print(f"ignored {error_count} parsing errors while in lenient mode",file=sys.stderr)    
WolfgangFahl commented 1 year ago

Instead of printing logging would IMHO make sense e.g. for the traceback information if lot's of traces are available. For the print there is currently one cut-off parameter num_errors which is unfortunately both used to show the most common errors as well as cutting of the number of tracebacks to be shown.

WolfgangFahl commented 1 year ago

i suggest to make parser_state part of the qif being returned after parse for inspection by the caller. I did this for the test of #78