scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 94 forks source link

ignore_mutation_write_errors context manager does not decrease severity #4881

Closed soyacz closed 2 years ago

soyacz commented 2 years ago

ignore_mutation_write_errors context manager does not decrease severity of events like:

2022-05-22 09:47:09.531 <2022-05-22 09:47:09.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=393debaf-3a91-4611-8d8d-105c6be8ad6c: type=DATABASE_ERROR regex=Exception  line_number=4084 node=longevity-lwt-parallel-24h-5-0-db-node-bdf8bc67-2
2022-05-22T09:47:09+00:00 longevity-lwt-parallel-24h-5-0-db-node-bdf8bc67-2 !     ERR |  [shard 2] storage_proxy - exception during mutation write to 10.0.1.26: std::runtime_error (Column definition lwt_indicator does not match any column in the query selection)
022-05-22T09:47:19+00:00 longevity-lwt-parallel-24h-5-0-db-node-bdf8bc67-1 !     ERR |  [shard 2] storage_proxy - exception during mutation write to 10.0.0.32: std::runtime_error (Column definition lwt_indicator does not match any column in the query selection)

although in code we can see that it should be logged as warning:

@contextmanager
def ignore_mutation_write_errors():
    with ExitStack() as stack:
        stack.enter_context(EventsSeverityChangerFilter(
            new_severity=Severity.WARNING,
            event_class=LogEvent,
            regex=r".*mutation_write_",
            extra_time_to_expiration=30
        ))
[...]

This happened in LWT test where we change schema. Details of the test:

Installation details

Kernel Version: 5.13.0-1022-aws Scylla version (or git commit hash): 5.0~rc5-20220515.9da666e77 with build-id 5992da65161c733b3f85b8c93bbbb5151d4d321c Cluster size: 4 nodes (i3.2xlarge)

Scylla Nodes used in this run:

OS / Image: ami-028dc71f18948aff6 (aws: eu-west-1)

Test: longevity-lwt-parallel-schema-changes-with-disruptive-24h-test Test id: bdf8bc67-60ac-43d5-bbf4-a5e0a316086d Test name: scylla-5.0/longevity/longevity-lwt-parallel-schema-changes-with-disruptive-24h-test Test config file(s):

Issue description

>>>>>>> Your description here... <<<<<<<

Logs:

Jenkins job URL

fruch commented 2 years ago

the regex in the filter doesn't match the lines you are seeing, if you think they should part of that, change the regex to make sure it cover them as well (after you verified that this isn't surfacing a real issue here)

soyacz commented 2 years ago

Yes, I saw it, anyway I created an issue to discuss it if it's still relevant. I think @juliayakovlev was writing these tests. @juliayakovlev what's your opinion?

juliayakovlev commented 2 years ago

@soyacz

Event in your example search for Exception, not ".*mutation_write_":

2022-05-22 09:47:09.531 <2022-05-22 09:47:09.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=393debaf-3a91-4611-8d8d-105c6be8ad6c: type=DATABASE_ERROR regex=Exception  line_number=4084 node=longevity-lwt-parallel-24h-5-0-db-node-bdf8bc67-2

Also the error is mutation write and not mutation_write_. It's other error, not that should be decreased.

So it's not connected to the contex manager.

Please, do not remove mutation_write_ regex

soyacz commented 2 years ago

Thanks @juliayakovlev. I'm closing the issue.