strapdata / elassandra

Elassandra = Elasticsearch + Apache Cassandra
http://www.elassandra.io
Apache License 2.0
1.71k stars 198 forks source link

Inconsistent Data Querying ElasticSearch #409

Open ribeirodba opened 2 years ago

ribeirodba commented 2 years ago

Look this test I´ve performed in Elassandra with Python.

I created a function to query data using Cassandra driver:

def process_query_cassandra(query, fetch_size = 5000, consistency_level=ConsistencyLevel.LOCAL_ONE): start = timer() paging_state = None rows = [] while True: statement = SimpleStatement(query, fetch_size = fetch_size, consistency_level=consistency_level) results = session.execute(statement, paging_state=paging_state) paging_state = results.paging_state for row in results.current_rows: rows.append(row) if paging_state == None: break df = pd.DataFrame(rows) end = timer() return df, timedelta(seconds=end-start)

Table f0101 has 872390 rows.

When I query using CQL only, results are OK:

query1 = """ select * from "dlfinjdep"."f0101" ALLOW FILTERING """

Running Cassandra #1 (22-06-01 12:43) Rows: 872390 seconds: 0:03:17.609349 Running Cassandra #2 (22-06-01 12:46) Rows: 872390 seconds: 0:03:04.289089

However, when I use the option to query ElasticSearch index through CQL, I get different results:

query2 = """ select * from "dlfinjdep"."f0101" WHERE es_query='{"query":{"match_all":{}}}'
AND es_options='indices=dlfinjdep-f0101-index' ALLOW FILTERING """

Running Elastic #1 (22-06-01 12:50) Rows: 841350 seconds: 0:03:49.136313 Running Elastic #2 (22-06-01 12:54) Rows: 834372 seconds: 0:03:33.985948

serversteam commented 1 year ago

Which version of elassandra are you using ?