scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 94 forks source link

Avoid known (minor) "marshaling error" issue in CREATE_INDEX nemesis #7032

Closed roydahan closed 10 months ago

roydahan commented 10 months ago

As part of this run: https://argus.scylladb.com/test/171cec11-a4d7-4637-baf9-4258d5069eaf/runs?additionalRuns[]=2f516c16-82c9-4cc6-87e2-dc6cb9c309d3

During disrupt_create_index nemesis, we have a verification query to the index. In this case, it chose a coulmn that is a timestamp column and query it (https://github.com/scylladb/scylla-cluster-tests/blob/039a041cf5d02c35917c573d06ac0439eca0177d/sdcm/utils/nemesis_utils/indexes.py#L109) It failed with: message="marshaling error: unable to parse date '2017-06-05 08:26:10.475000': marshaling error: Milliseconds length exceeds expected (6)"

Apparently, there is a very old issue with this marshaling_error on timestamp issue. a good summary in this commit: https://github.com/scylladb/scylladb/commit/f5cda3cfc3850ca54e79f2621cf3d4440744e8b0

We need to find a way to discard this error in such case and fix this nemesis.

roydahan commented 10 months ago

@fruch I think that @soyacz wrote this nemesis, let's had a way to workaround this issue.

nyh commented 10 months ago

@roydahan I think you put the wrong link. The relevant issue is https://github.com/scylladb/scylladb/issues/16575 (it's a new issue I opened yesterday, on a problem already fixed in pull request https://github.com/scylladb/scylladb/pull/14726). I also wrote a test for it yesterday: https://github.com/nyh/scylla/commit/f7ca8dfc0932358f10c437e004ad9ea0923daf0a.

As the issue explains, "2017-06-05 08:26:10.475000" is not allowed by Scylla because Scylla only allows three digits of sub-second precision for a timestamp value. This makes sense, because the timestamp officially has only millisecond resolution - although Cassandra DOES allow any precision (which is just truncated) so this timestamp would have worked in Cassandra.

The question is where you got that "2017-06-05 08:26:10.475000" in your test. If it's a constant, you should fix it to ""2017-06-05 08:26:10.475". If it is generated by an earlier output from Scylla, you must be running a version which doesn't have https://github.com/scylladb/scylladb/pull/14726 backported - perhaps 5.2?

nyh commented 10 months ago

@roydahan I think you put the wrong link.

Actually, you were right the link you had - a test I wrote - did include a test which explains this specific problem:

# The timestamp column type has millisecond resolution. What happens if we
# assign a string to a timestamp column which attempts to specify a time
# with too many digits of precision? In Cassandra, the extra digits are
# silently truncated. In Scylla, it's an error with somewhat obscure wording:
#  "marshaling error: unable to parse date '2011-02-03 04:05:12.345678+0000':
#   marshaling error: Milliseconds length exceeds expected (6)"
# For now, let's accept Scylla's error handling as the better approach, so
# mark this test cassandra_bug.
def test_type_timestamp_from_string_overprecise(cql, table1, cassandra_bug):
    p = unique_key_int()
    with pytest.raises(InvalidRequest):
        cql.execute(f"INSERT INTO {table1} (p, t) VALUES ({p}, '2011-02-03 04:05:12.345678+0000')")
        assert list(cql.execute(f"SELECT t from {table1} where p = {p}")) == [(datetime(2011, 2, 3, 4, 5, 12, 345000),)]
fruch commented 10 months ago

@roydahan I think you put the wrong link. The relevant issue is https://github.com/scylladb/scylladb/issues/16575 (it's a new issue I opened yesterday, on a problem already fixed in pull request https://github.com/scylladb/scylladb/pull/14726). I also wrote a test for it yesterday: https://github.com/nyh/scylla/commit/f7ca8dfc0932358f10c437e004ad9ea0923daf0a.

As the issue explains, "2017-06-05 08:26:10.475000" is not allowed by Scylla because Scylla only allows three digits of sub-second precision for a timestamp value. This makes sense, because the timestamp officially has only millisecond resolution - although Cassandra DOES allow any precision (which is just truncated) so this timestamp would have worked in Cassandra.

The question is where you got that "2017-06-05 08:26:10.475000" in your test. If it's a constant, you should fix it to ""2017-06-05 08:26:10.475". If it is generated by an earlier output from Scylla, you must be running a version which doesn't have https://github.com/scylladb/scylladb/pull/14726 backported - perhaps 5.2?

It's a run with 2024.1 And yes it's data scylla return via the python driver, and then we use it in a select statement

fruch commented 10 months ago

@roydahan if the bug is going to be fix, does it worth other while to find a way to ignore/not hit it ?

roydahan commented 10 months ago

I'm not sure if it's going to be fixed, from what I read in @nyh comment, it's not, but maybe I misunderstood him.

nyh commented 10 months ago

@roydahan I think you put the wrong link. The relevant issue is scylladb/scylladb#16575 (it's a new issue I opened yesterday, on a problem already fixed in pull request scylladb/scylladb#14726). I also wrote a test for it yesterday: nyh/scylla@f7ca8df. As the issue explains, "2017-06-05 08:26:10.475000" is not allowed by Scylla because Scylla only allows three digits of sub-second precision for a timestamp value. This makes sense, because the timestamp officially has only millisecond resolution - although Cassandra DOES allow any precision (which is just truncated) so this timestamp would have worked in Cassandra. The question is where you got that "2017-06-05 08:26:10.475000" in your test. If it's a constant, you should fix it to ""2017-06-05 08:26:10.475". If it is generated by an earlier output from Scylla, you must be running a version which doesn't have scylladb/scylladb#14726 backported - perhaps 5.2?

It's a run with 2024.1 And yes it's data scylla return via the python driver, and then we use it in a select statement

Can you give me the exact command you used which resulted in the string '2017-06-05 08:26:10.475000' - with six sub-second digits (instead of just 3). You are saying that the Python Driver, not Scylla generated this string? If so, why?

If it's not the Python driver, and it's Scylla which generated this string, this was a bug in Scylla that was fixed in https://github.com/scylladb/scylladb/pull/14726 - so if this bug exists in 2024.1, it means it is missing a backport. Is 2024.1 based on 5.2 (which we know is missing a backport - see https://github.com/scylladb/scylladb/issues/16575)?

fruch commented 10 months ago

@nyh this is the code

def verify_query_by_index_works(session, ks, cf, column) -> None:
    # get some value from table to use it in query by index
    result = session.execute(SimpleStatement(f'SELECT "{column}" FROM {ks}.{cf} limit 1', fetch_size=1))
    value = list(result)[0][0]
    if value is None:
        # scylla does not support 'is null' in where clause: https://github.com/scylladb/scylladb/issues/8517
        InfoEvent(message=f"No value for column {column} in {ks}.{cf}, skipping querying created index.",
                  severity=Severity.NORMAL).publish()
        return
    match type(value).__name__:
        case "str" | "datetime":
            value = str(value).replace("'", "''")
            value = f"'{value}'"
        case "bytes":
            value = "0x" + value.hex()
    try:
        query = SimpleStatement(f'SELECT * FROM {ks}.{cf} WHERE "{column}" = {value} LIMIT 100', fetch_size=100)
        LOGGER.debug("Verifying query by index works: %s", query)
        result = session.execute(query)
    except Exception as exc:  # pylint: disable=broad-except
        InfoEvent(message=f"Index {ks}.{cf}({column}) does not work in query: {query}. Reason: {exc}",
                  severity=Severity.ERROR).publish()
    if len(list(result)) == 0:
        InfoEvent(message=f"Index {ks}.{cf}({column}) does not work. No rows returned for query {query}",
                  severity=Severity.ERROR).publish()

and the version used is: 2024.1.0~rc2-20231217.f57117d9cfe3 I've checked and it does have: scylladb/scylla@ff721ec3e32610ac0a65113f20e20efd8c784eed

nyh commented 10 months ago

So @fruch, I think you have a (minor) bug in the test:

You create a "value" with some Python code, which IIUC, creates 6 digits of precision, and then try a SELECT with it. The format of this "value" which you Python created is not supported in Scylla, and I'm not sure if it was ever supported.

Cassandra does allow this form of value (with 6 digits of precision, ignoring the last three). So we could have argued that it's a bug that we don't - but we talked about it and decided to consider this specific detail a Cassandra bug - Cassandra shouldn't allow 6 digits of precision if a timestamp value only supports 3 (timestamps hold milliseconds, not anything of higher precision).

The issue in https://github.com/scylladb/scylladb/issues/16575 was different - it was Scylla itself (not some Python test) returning the "wrong" format, and when the user tried to insert the value it got from Scylla, it could not be inserted.

fruch commented 10 months ago

Looking it closely, python driver used it DateType which serialize to python datetime object. and we are using it's string representation in the query

so we need to cut it, or pass datetime object to the driver and let him figure it out (the driver code know to handle it correctly, I think)

fruch commented 10 months ago

doing it straight like this, without any manipulation of the value, is working as expected, for all of the different types:

        query = SimpleStatement(f'SELECT * FROM {ks}.{cf} WHERE "{column}" = %s LIMIT 100', fetch_size=100)
        result = session.execute(query, parameters=(value,))

i.e. driver should be able to convert back and forth it's types, no need to any special treatment.

nyh commented 10 months ago

Indeed, luckily the driver does the right thing - it only leaves 3 digits of precision. I wrote a cql-pytest to verify it:

# In the test above we saw that Scylla refuses to accept overprecise (more
# than 3 digits of precision) sub-second timestamp strings. Let's check that
# the Python driver doesn't generate such forbidden strings when given a
# "datetime" object with microsecond precisions. This test doesn't really
# test Scylla itself, but it tests how the Python driver interacts with it
# and confirms that whether we didn't make a mistake when we decided to
# forbid over-precise sub-second strings which Cassandra allows - if the
# Python driver generated such strings, it would be a mistake not to accept
# them.
def test_type_timestamp_from_python_datetime(cql, table1):
    stmt = cql.prepare(f"INSERT INTO {table1} (p, t) VALUES (?, ?)")
    p = unique_key_int()
    cql.execute(stmt, [p, datetime(2011, 2, 3, 4, 5, 12, 345000)])
    assert list(cql.execute(f"SELECT t from {table1} where p = {p}")) == [(datetime(2011, 2, 3, 4, 5, 12, 345000),)]
    # Even if the Python datetime is overprecise (has microsecond digits),
    # the driver doesn't try to write the extra digits to the server (Scylla
    # wouldn't accept such extra digits) - the 345123 is truncated to 345000.
    p = unique_key_int()
    cql.execute(stmt, [p, datetime(2011, 2, 3, 4, 5, 12, 345123)])
    assert list(cql.execute(f"SELECT t from {table1} where p = {p}")) == [(datetime(2011, 2, 3, 4, 5, 12, 345000),)]