scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
12.96k stars 1.24k forks source link

Add support for Cassandra NB format (Cassandra 4.0) #8583

Open slivne opened 3 years ago

slivne commented 3 years ago

Cassandra 4 introduced the NA format which is a set of changes

One of the changes we have introduced is the ability to stream information internally in scylla at nodetool refresh instead of using the sstableloader - as such it would be appropriate for scylla to support the NA format naively - even only for reads

        // na (4.0.0): uncompressed chunks, pending repair session, isTransient, checksummed sstable metadata file, new Bloomfilter format

CASSANDRA-9425 Make node-local schema fully immutable
CASSANDRA-9143 Fix consistency of incrementally repaired data across replicas
CASSANDRA-10520 Compressed writer and reader should support non-compressed data.
CASSANDRA-13420 Pending repair info was added in 4.0
CASSANDRA-13321 Add a checksum component for the sstable metadata (-Statistics.db) file
CASSANDRA-9067 BloomFilter serialization format should not change byte ordering
CASSANDRA-14404 Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
slivne commented 3 years ago

@avikivity - need your input on this

I want to break it into three parts

fee-mendes commented 2 years ago

@slivne FYI -- NA format is C 4 format up to release candidate. Starting with C 4.0 GA, the new format is NB. See here.

 // na (4.0-rc1): uncompressed chunks, pending repair session, isTransient, checksummed sstable metadata file, new Bloomfilter format
 // nb (4.0.0): originating host id

Currently, both sstableloader and nodetool refresh fail badly with NA format:

# nodetool refresh ks tbl
nodetool: Scylla API server HTTP POST to URL '/storage_service/sstables/ks' failed: Failed to load new sstables: sstables::malformed_sstable_exception (invalid version for file na-1-big-CompressionInfo.db. Name doesn't match any known version.)
# sstableloader -d 127.0.0.1 ks/tbl-b7dc81e0306311ec9779b50119569f33/
===== Using optimized driver!!! =====
java.lang.ArrayIndexOutOfBoundsException: Index 640228101 out of bounds for length 4
    at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:114)
    at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93)
    at org.apache.cassandra.io.sstable.format.SSTableReader.openForBatch(SSTableReader.java:416)
    at com.scylladb.tools.BulkLoader.openFile(BulkLoader.java:1525)
    at com.scylladb.tools.BulkLoader.process(BulkLoader.java:1570)
    at com.scylladb.tools.BulkLoader.lambda$main$1(BulkLoader.java:1367)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
  0% done.        0 statements sent (in        0 batches,        0 failed).
       0 statements generated.
       0 cql rows processed in        0 partitions.
       0 cql rows and        0 partitions deleted.
       0 local and        0 remote counter shards where skipped.
avikivity commented 2 years ago

The changes look fairly small, we could incorporate them into Scylla.

re sstableloader, perhaps we should write it in C++ using out own infrastructure (based on @denesb sstable-tools). Main stopper is that we have no Seastar CQL driver.

denesb commented 2 years ago

re sstableloader, perhaps we should write it in C++ using out own infrastructure (based on @denesb sstable-tools). Main stopper is that we have no Seastar CQL driver.

IIRC @asias suggested a load-and-stream operation which could replace sstableloader, no? If we really want an external tool, we can still make one that connects to the scylla cluster via RPC and streams the data using the streaming API.

avikivity commented 2 years ago

It's not a good idea to expose the internal RPC to clients. It's not authenticated and it has a separate deprecation schedule.

dorlaor commented 2 years ago

@asias @denesb so what's your call regarding this format?

asias commented 2 years ago

It's not a good idea to expose the internal RPC to clients. It's not authenticated and it has a separate deprecation schedule.

Load and Stream feature can load any sstables scylla supports. We could have a HTTP API to stream files to scylla instead of scp to a node. Then we trigger load and stream to distribute the data to scylla nodes.

So that we can have lightweight external tool or even a simple script to load sstables to scylla.

I think we should support 4.0 format at least for read otherwise its hard to migrate 4.0 clusters.

dorlaor commented 2 years ago

Ok, load and stream is indeed a simple way to load data to Scylla. What's special about the .NA format - is there anything in the file we need to add

On Mon, Jan 10, 2022 at 2:46 AM Asias He @.***> wrote:

It's not a good idea to expose the internal RPC to clients. It's not authenticated and it has a separate deprecation schedule.

Load and Stream feature can load any sstables scylla supports. We could have a HTTP API to stream files to scylla instead of scp to a node. Then we trigger load and stream to distribute the data to scylla nodes.

So that we can have lightweight external tool or even a simple script to load sstables to scylla.

I think we should support 4.0 format at least for read otherwise its hard to migrate 4.0 clusters.

— Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/8583#issuecomment-1008461574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHUROLXXG2H3E5DTFLXKTUVIT7PANCNFSM44BAAWEA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

asias commented 2 years ago

Ok, load and stream is indeed a simple way to load data to Scylla. What's special about the .NA format - is there anything in the file we need to add

Avi mentioned here that the NA change is small. No change is needed from load and stream part.

dorlaor commented 2 years ago

Well, I just want to assign it to someone (Botond or Asias or another volunteer) to get it done

asias commented 2 years ago

Well, I just want to assign it to someone (Botond or Asias or another volunteer) to get it done

It is more efficient to assign to people who is familiar with the sstable formats. So I am not volunteering here.

denesb commented 2 years ago

I think we should ask @bhalevy to volunteer someone from our team.

bhalevy commented 2 years ago

my plan is for @cmm to work on this after #9869

nyh commented 2 years ago

As noted by @fee-mendes above the NA format is not interesting, it was only used in release candidates, and the 4.0 GA came out with the NB format so that's the format we need to support and I changed the issue title accordingly.

As @slivne noted above, this is really two issues, possibly should even be done by different people:

  1. We need to support the NB format in the Java tools (especially sstableloader). I guess this will require merging code from upstream Cassandra into our Java tools.
  2. We need to support the NB format in the Scylla sstable read code, and probably (but not necessarily!) write code as well.
denesb commented 2 years ago

We already need to update our java tools for the ME format (https://github.com/scylladb/scylla-tools-java/issues/291), we might as well update it from a C* version which also supports NB, to kill two birds with one stone.

tzach commented 2 years ago

Adding Doc label as this feature impact docs

roydahan commented 8 months ago

Do we still not supporting the NB format?

denesb commented 8 months ago

Do we still not supporting the NB format?

Yes, still not supported.