Open parselife opened 2 years ago
I find the cassandra's table definition :
**primary key (partition, adapter_id, sort, data_id, vis, nano_time, field_mask, value, num_duplicates)**
Any way to custom this ?
Not sure why you'd why exactly you'd want to customize that primary key, you can give it the data ID to be unique, and other things like sort and partition key come from the index (which again you could customize but probably don't want to).
The issue is most likely that you are inserting rows into the index with the same adapter ID and data ID but different sort keys. This would happen, for example, if you were using a spatial index and the rows had different geometries (or similarly a temporal index with different date/times). In these rare cases you would want to delete the row prior to ingesting. The num_duplicates
identifier that we tack onto the primary key is a hint that we intentionally are storing duplicates, and this can happen in rare circumstances such as if you are storing a time range (consider a track that has a start time and and end time) and that time range crosses a periodicity boundary on a temporal index (because time is unbounded, we place it on the space filling curve by applying a periodicity such as a year which is our default but can be configured, so in the case of a year periodicity if the track started on Dec. 31 and ended on Jan 1 for example, we have to insert 2 rows on each side of the boundary and we maintain that with the hint num_duplicates
). Hopefully that adds some clarity to your situation - as mentioned most likely you are inserting a data ID multiple times with different sort keys, such as different geometries within a spatial index, which will require deleting the previous one prior to insertion in that case.
Thx for your reply, Where can i find the sort keys
? My situation is that: The data written twice is just the same
Do you have a "ROUND_ROBIN" partition strategy on your index (such as described in this add index help output, https://locationtech.github.io/geowave/latest/userguide.html#help-command)? This partition strategy would by design add random partition keys even to identical rows and explain this behavior you're seeing.
With documentation, there is :
according to that, Adapter ID and Data ID define a unique identifier, so how to ingest data without duplication allowed?
now, my index looks like
Why this happened?
The values of
adapter_id
anddata_id
in these two records are the samei want to get a single record without a duplicated one, how can i do?