Closed dioptre closed 2 years ago
@dioptre
Wonderful, I created the diff of both SFPLA keyspaces and started looking into the differences, let's discuss them here:
--> Don't see any problem here.
--> @dioptre : Please clarify: regarding the tables that we seem to have in HD keyspace instead. Will they be moved? Do we need to upgrade the HD table instead? Will they co-exist? Is there any case where a row in an old existing table A may rely on an row in a new table B (that we do not have, yet)?
You mentioned that change, literally all UUIDv4 were replaced by UUIDv1, they are both 128 Bits, but still quite different so how will we do this? Are we just going to keep the V4 uuids in the V1 timeuuid columns? Wouldn't that break? Trying to read the timestamp or the mac from a V4 will just give you garbage back.. so we could never fully use the timeuuid data because everything old would not be a timeuuid? Also the uuid got the version encoded (I made it bold below), so anything can (and may) figure out that these are not timeuuids.
V1: a9db38e2-e051-11ea-87d0-0242ac130003
V4: 61bab3cb-2a62-4f1c-b2de-587b4ddeeb64
Several existing tables had their PK extended by an additional hhash value (e.g. tables ips, browsers, ...). How are we going to migrate these? When we try to insert our old records into these new tables, we will have to generate this hhash value for each exported row before we can import it into new schema
It might be easier to simply setup a new sfpla2 keyspace and switch to it and leave the old sfpla untouched? It's not perfectly nice with regards to superset etc. but it is a lot easier to setup a second keyspace in superset/jupyter etc. rather than to migrate this super huge old and quite incompatible sfpla data?
Responses mate:
2) Nothing new needs to be ported to sfpla from hd. We should probably check that all the existing calls from hd code don't break in the new schema (I think everything should generally be additive and should be ok fingers crossed but lets try and break it/find any issues).
3) Everything that I setup should be using v1s (the only exception to this was the admin user who had 00000-0000-0000-00000 etc.) which I think we can ignore. If anyone has made this v4 tho now we are in trouble (easy to test though). Plan to check this is to nix every table except for events and do a migration for it and it alone (@hilsbos ?).
4) Should solve this in 3 (combined key is important now as we can measure related assets and control usage moving forward).
5) Yeah we could do a full migration, but I don't think we need to?!
Nothing new needs to be ported to sfpla from hd. We should probably check that all the existing calls from hd code don't break in the new schema (I think everything should generally be additive and should be ok fingers crossed but lets try and break it/find any issues).
Ok. But I was asking: Why do we have some of the new sfpla tables in our hd keyspace instead (like cohorts, messages_ext, ..)? Do we have to create them in sfpla? Keep the hd ones? Delete them? Or replace the hd ones instead?
Everything that I setup should be using v1s (the only exception to this was the admin user who had 00000-0000-0000-00000 etc.) which I think we can ignore. If anyone has made this v4 tho now we are in trouble (easy to test though). Plan to check this is to nix every table except for events and do a migration for it and it alone (@hilsbos ?).
Ok. I just checked a few rows in sfpla.events, and yes, these have already V1 timeuuid stored in the V4 uuid columns. I really did not expect this, so I'm not sure if we did this everywhere that way (because the current column type uuid would actually be v4 not v1.. see here).
I'm going to consider the indexes in events table over the weekend. How you feeling @cyberjunk ?
@cyberjunk showing the differences between the old and current schema
1c1
< CREATE KEYSPACE sfpla WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '1'} AND durable_writes = true;
---
> CREATE KEYSPACE sfpla3 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '1', 'DC2': '1'} AND durable_writes = true;
3c3
< CREATE TYPE sfpla.geo_point (
---
> CREATE TYPE sfpla3.geo_point (
8c8
< CREATE TYPE sfpla.geo_pol (
---
> CREATE TYPE sfpla3.geo_pol (
10d9
< rcode text,
13,14c12
< city text,
< zip text
---
> city text
17c15
< CREATE TYPE sfpla.payment (
---
> CREATE TYPE sfpla3.payment (
45c43
< CREATE TYPE sfpla.viewport (
---
> CREATE TYPE sfpla3.viewport (
50c48
< CREATE TABLE sfpla.redirects (
---
> CREATE TABLE sfpla3.redirects (
71c69
< CREATE TABLE sfpla.logs (
---
> CREATE TABLE sfpla3.logs (
99a98
> CREATE CUSTOM INDEX elastic_logs_idx ON sfpla3.logs () USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
101c100
< CREATE TABLE sfpla.events_recent (
---
> CREATE TABLE sfpla3.events_recent (
139,140c138
< xid text,
< zip text
---
> xid text
149c147
< AND gc_grace_seconds = 14400
---
> AND gc_grace_seconds = 864000
154a153
> CREATE CUSTOM INDEX elastic_events_recent_idx ON sfpla3.events_recent () USING 'org.elassandra.index.ExtendedElasticSecondaryIndex';
156c155
< CREATE TABLE sfpla.dailies (
---
> CREATE TABLE sfpla3.dailies (
177c176
< CREATE TABLE sfpla.agreed (
---
> CREATE TABLE sfpla3.agreed (
233c232
< CREATE TABLE sfpla.locations (
---
> CREATE TABLE sfpla3.locations (
255c254
< CREATE TABLE sfpla.actions (
---
> CREATE TABLE sfpla3.actions (
282c281
< CREATE TABLE sfpla.outcomes (
---
> CREATE TABLE sfpla3.outcomes (
306c305
< CREATE TABLE sfpla.ips (
---
> CREATE TABLE sfpla3.ips (
326c325
< CREATE TABLE sfpla.browsers (
---
> CREATE TABLE sfpla3.browsers (
348c347
< CREATE TABLE sfpla.accounts (
---
> CREATE TABLE sfpla3.accounts (
372c371
< CREATE TABLE sfpla.redirect_history (
---
> CREATE TABLE sfpla3.redirect_history (
398c397
< CREATE INDEX hostto_redirect_history_idx ON sfpla.redirect_history (hostto);
---
> CREATE INDEX hostto_redirect_history_idx ON sfpla3.redirect_history (hostto);
400c399
< CREATE TABLE sfpla.sequences (
---
> CREATE TABLE sfpla3.sequences (
418c417
< CREATE TABLE sfpla.referrers (
---
> CREATE TABLE sfpla3.referrers (
438c437
< CREATE TABLE sfpla.aliases (
---
> CREATE TABLE sfpla3.aliases (
460c459
< CREATE TABLE sfpla.usernames (
---
> CREATE TABLE sfpla3.usernames (
482c481
< CREATE TABLE sfpla.jurisdictions (
---
> CREATE TABLE sfpla3.jurisdictions (
505c504
< CREATE INDEX locs_jurisdictions_idx ON sfpla.jurisdictions (full(locs));
---
> CREATE INDEX locs_jurisdictions_idx ON sfpla3.jurisdictions (full(locs));
507c506
< CREATE TABLE sfpla.action_names (
---
> CREATE TABLE sfpla3.action_names (
524c523
< CREATE TABLE sfpla.agreements (
---
> CREATE TABLE sfpla3.agreements (
578c577
< CREATE TABLE sfpla.actions_ext (
---
> CREATE TABLE sfpla3.actions_ext (
603c602
< CREATE TABLE sfpla.nodes (
---
> CREATE TABLE sfpla3.nodes (
627c626
< CREATE TABLE sfpla.events (
---
> CREATE TABLE sfpla3.events (
665,666c664
< xid text,
< zip text
---
> xid text
682c680
< CREATE TABLE sfpla.counters (
---
> CREATE TABLE sfpla3.counters (
700c698
< CREATE TABLE sfpla.tlvv (
---
> CREATE TABLE sfpla3.tlvv (
730c728
< CREATE TABLE sfpla.cohorts (
---
> CREATE TABLE sfpla3.cohorts (
753c751
< CREATE TABLE sfpla.tlvu (
---
> CREATE TABLE sfpla3.tlvu (
783c781
< CREATE TABLE sfpla.users (
---
> CREATE TABLE sfpla3.users (
805c803
< CREATE TABLE sfpla.sessions (
---
> CREATE TABLE sfpla3.sessions (
853d850
< zip text,
871,932c868
< CREATE TABLE sfpla.zips (
< country text,
< zip text,
< asian double,
< black double,
< carpool double,
< childpoverty double,
< city text,
< construction double,
< county text,
< created timestamp,
< culture text,
< drive double,
< employed int,
< familywork double,
< hispanic double,
< income double,
< incomeerr double,
< incomepercap double,
< incomepercaperr double,
< latlon frozen<geo_point>,
< loc frozen<geo_pol>,
< meancommute double,
< men int,
< native double,
< office double,
< othertransport double,
< pacific double,
< population int,
< poverty double,
< privatework double,
< production double,
< professional double,
< publicwork double,
< rcode text,
< region text,
< selfemployed double,
< service double,
< transit double,
< unemployment double,
< voters int,
< walk double,
< white double,
< women int,
< workathome double,
< PRIMARY KEY ((country, zip))
< ) WITH bloom_filter_fp_chance = 0.01
< AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
< AND comment = ''
< AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
< AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
< AND crc_check_chance = 1.0
< AND dclocal_read_repair_chance = 0.1
< AND default_time_to_live = 0
< AND gc_grace_seconds = 60
< AND max_index_interval = 2048
< AND memtable_flush_period_in_ms = 0
< AND min_index_interval = 128
< AND read_repair_chance = 0.0
< AND speculative_retry = '99PERCENTILE';
<
< CREATE TABLE sfpla.affiliates (
---
> CREATE TABLE sfpla3.affiliates (
952c888
< CREATE TABLE sfpla.visitors (
---
> CREATE TABLE sfpla3.visitors (
998,999c934
< xid text,
< zip text
---
> xid text
1015c950
< CREATE TABLE sfpla.updates (
---
> CREATE TABLE sfpla3.updates (
1034c969
< CREATE TABLE sfpla.services (
---
> CREATE TABLE sfpla3.services (
1058c993
< CREATE TABLE sfpla.reqs (
---
> CREATE TABLE sfpla3.reqs (
1078c1013
< CREATE TABLE sfpla.emails (
---
> CREATE TABLE sfpla3.emails (
1099c1034
< CREATE TABLE sfpla.hits (
---
> CREATE TABLE sfpla3.hits (
1119c1054
< CREATE TABLE sfpla.referrals (
---
> CREATE TABLE sfpla3.referrals (
1140c1075
< CREATE TABLE sfpla.tlv (
---
> CREATE TABLE sfpla3.tlv (
1168c1103
< CREATE TABLE sfpla.countries (
---
> CREATE TABLE sfpla3.countries (
1187c1122
< CREATE TABLE sfpla.cells (
---
> CREATE TABLE sfpla3.cells (
1208c1143
< CREATE TABLE sfpla.messages (
---
> CREATE TABLE sfpla3.messages (
1255c1190
< CREATE TABLE sfpla.routed (
---
> CREATE TABLE sfpla3.routed (
1275c1210
< CREATE TABLE sfpla.hosts (
---
> CREATE TABLE sfpla3.hosts (
1295c1230
< CREATE TABLE sfpla.geo_ip (
---
> CREATE TABLE sfpla3.geo_ip (
1325c1260
< CREATE TABLE sfpla.queues (
---
> CREATE TABLE sfpla3.queues (
1353,1355c1288,1290
< CREATE INDEX queues_started_idx ON sfpla.queues (started);
< CREATE INDEX queues_type_idx ON sfpla.queues (src);
< CREATE INDEX queues_completed_idx ON sfpla.queues (completed);
---
> CREATE INDEX queues_started_idx ON sfpla3.queues (started);
> CREATE INDEX queues_type_idx ON sfpla3.queues (src);
> CREATE INDEX queues_completed_idx ON sfpla3.queues (completed);
1357c1292
< CREATE TABLE sfpla.referred (
---
> CREATE TABLE sfpla3.referred (
1377d1311
<
@cyberjunk here is the new schema:
The old schema:
Please take a look, and let's chat when you have time to work out next steps.