world-federation-of-advertisers / any-sketch-java

Apache License 2.0
2 stars 0 forks source link

[label:question ] Usage examples #10

Open DimaMakarevich opened 2 years ago

DimaMakarevich commented 2 years ago

Hello everyone, I understand that you don't have to answer questions, but answers to a couple of questions would be very useful for many who are trying to figure out the generation sketches))

  1. Why you insert data with serialVersionUID 1L `
      anySketch.insert(123456, ImmutableMap.of("frequency", 1L));
      anySketch.insert(999999, ImmutableMap.of("frequency", 1L));

and why in another example using different approach

          anySketch.insert("person one", ImmutableMap.of("frequency", 1L));
          anySketch.insert("persona dos", ImmutableMap.of("frequency", 2L));
          anySketch.insert("personne trois", ImmutableMap.of("frequency", 3L));
          anySketch.insert("qof afar", ImmutableMap.of("frequency", 4L));
  1. For example, we have table like this

    user_id, favorite_game, age
    sean, halo, 36
    powers, smash, 34
    cohen, smash, 33

    what would the generation of LiquidLegions sketches look like in this example by the user_id column?

  2. it would also be useful to know how you save sketches in disk?

  3. is it even possible to use this repository outside of the context of this https://github.com/world-federation-of-advertisers/cross-media-measurement ?

SanjayVas commented 2 years ago

@EvgSkv for answering details about sketches in general.

it would also be useful to know how you save sketches in disk?

You can just save the serialized Sketch protobuf message to a file. This is generally true of any protocol buffer. See https://developers.google.com/protocol-buffers for more information on protocol buffer serialization.

is it even possible to use this repository outside of the context of this https://github.com/world-federation-of-advertisers/cross-media-measurement ?

This is meant to be a general-purpose library for sketch generation, but that is the only use case that the engineers working on this are focused on at the moment.

DimaMakarevich commented 2 years ago

Hi ,@EvgSkv, I immediately want to apologize for distracting you with more important matters, but could you clarify some points that I outlined above?

EvgSkv commented 2 years ago

@DimaMakarevich thank you for questions. Here are the answers:

  1. The difference between these two approaches is just the type of the first argument. Integer argument gets translated to string by casting.

  2. Depending on how you generate the sketch. If you put all of userids into one sketch the sketch would represent the set {shaun, powers, cohen}.

  3. See Sanjay's answer above.

  4. Yes, it's a generic data structure, but it's only useful in the context of secure computation and functionality of encryption / differential privacy is not included in this repo.

DimaMakarevich commented 2 years ago

Hi @EvgSkv, I wanted to clarify whether a dedicated field is exactly needed in the configuration of liquid legions? and if уes, what should be the relationship between the value of num_values in SamplingIndicator and num_values in Index? image