Closed fcosta-td closed 3 years ago
Please let me know if you need additional information. Thanks
/cc @iravid
@fcosta-td could you share schema? we miss some encoder for some type it seems
ok, I think this looks similar to https://github.com/scylladb/scylla-migrator/pull/16
@fcosta-td can you patch your clone with above PR and retry please?
CREATE TABLE apps_subscriptions (
app_id text,
app_version_id text,
active boolean,
created timestamp,
destination text,
events set<text>,
modified timestamp,
shared_secret_id text,
PRIMARY KEY (app_id, app_version_id)
) WITH CLUSTERING ORDER BY (app_version_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
ok, I think this looks similar to
16
@fcosta-td can you patch your clone with above PR and retry please?
Hi @tarzanek, I've already tried that and it did not work. That function (explodeRow) was moved from Migrator.scala to Cassandra.scala, but when i change it to match that PR, it fails to compile.
ok, let me try to update the patch
@fcosta-td can you check https://github.com/tarzanek/scylla-migrator/tree/null-regular-columns-rebase and build and try to use it? worst case ping me on slack, I can give you a build of above if needed
Hi @tarzanek , I've just tested it and it still fails with the same error.
any update
we are getting same issue null columns
org.apache.spark.sql.Row, true]).isNullAt) null else mapobjects(MapObjects_loopValue0, MapObjects_loopIsNull0, ObjectType(class java.lang.Object), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(lambdavariable(MapObjects_loopValue0, MapObjects_loopIsNull0, ObjectType(class java.lang.Object), true), StringType), true, false), validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, exclusions), ArrayType(StringType,true)), None) AS exclusions#2 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:292) at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:594) at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:594) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16) at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:105) at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.
Hi folks, sorry for the problems you're running into here. This looks familiar to me so it must have been a regression from the recent driver upgrade. I will take a stab at this later today.
Thanks @iravid. Seeing the same issue here.
@iravid This is our setup that causes the problem.
CREATE TABLE qa2.exclusions (
sku text,
exclusion_type text,
exclusions list<text>,
PRIMARY KEY (sku, exclusion_type)
) WITH CLUSTERING ORDER BY (exclusion_type ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.DeflateCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, sku), StringType), true, false) AS sku#0
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, exclusion_type), StringType), true, false) AS exclusion_type#1
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else mapobjects(MapObjects_loopValue0, MapObjects_loopIsNull0, ObjectType(class java.lang.Object), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(lambdavariable(MapObjects_loopValue0, MapObjects_loopIsNull0, ObjectType(class java.lang.Object), true), StringType), true, false), validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, exclusions), ArrayType(StringType,true)), None) AS exclusions#2
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:292)
at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:594)
at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:594)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:105)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:30)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:30)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$2.apply(TableWriter.scala:241)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$2.apply(TableWriter.scala:210)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:112)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:129)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:210)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:188)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:175)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:38)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:38)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289)
... 30 more
Tried this with a small table with known data. Ensured there were no nulls in the 'exclusions' column collections. Still failed with same exception described above.
cassuser@cqlsh> select * from qa2.exclusions_2;
sku | exclusion_type | exclusions
---------+----------------+------------------
1000000 | x | ['159']
1000001 | x | ['5157', '1371']
1000002 | x | ['5788']
1000003 | x | ['6397']
1000004 | x | ['4031']
1000005 | x | ['1231']
1000006 | x | ['7128']
1000007 | x | ['907']
1000008 | x | ['3192', '7327']
1000009 | x | ['9529']
(10 rows)
Thanks for the info everyone! The problem is with composite data types (lists, sets, maps, tuples, UDTs) that contain strings. Will have a fix soon.
@iravid thanks for looking into it .looking forward for the fix.
Latest master should fix this 👍 Thank you for your patience!
When running scylla-migrator i am getting the following error on a table with 200 records that allows null values: