Closed commercial-hippie closed 5 years ago
Supporting it will be straightforward, although we don't have it right now.
This may require inserting some extra metadata (into a verdict-managed table) to record the sizes of the original tables. To see the reason, suppose two tables A and B. Suppose we chose 100 tuples of A's 1000 tuples; and we chose 100 tuples out of B's 2000 tuples. Then, a higher sampling probability (100/1000) is used for A in comparison to B (100/2000); thus, we need to correct this bias.
Please let us know if you have decided to use daily insertions (e.g., as a new partition or so). I tentatively label this issue as 'feature request'.
@pyongjoo we will definitely use daily insertions when it becomes an available feature.. I might look into doing this manually in about 2 weeks.
I was thinking of just cloning the original scramble insert query (create table as select from
).. And do something like:
INSERT INTO verdictdb_scrambles.table_name SELECT FROM (copied from original insert query) WHERE data > date_since_last_update
.verdictdbmeta
table.Would that work?
Our schemas or scramble sizes wont be changing so doing it manually for now is not a problem for me. :smile:
That will certainly work, but I don't think that implementing the same logic inside Verdict won't be difficult as well.
Let me have some discussions with @dongyoungy about its implementation plan.
Will there a be way to append to a scramble in the future?
ie. if we have a table which has data added on a daily basis, do we need to drop the scramble and re-create?
I was thinking we might be able to just copy the query used to create the scramble manually (select from) and do a
INSERT (SELECT FROM WHERE 'new data conditions')
.Could that work or would it mess with the calculations Verdict does?
Thanks! Mike