venantius / droplet

Droplet is a Python library for sampling, sketching, and summarizing data from massive data streams.
6 stars 2 forks source link

what is principle of the the sparse recovery in l0-sampling #1

Open asynchronoust opened 8 years ago

asynchronoust commented 8 years ago

i have read the "a unifying framework for l0 sampling", but i do not understand s sparse recovery. output x' = x with high probability, but how can it get a sample from x' .

get subset ai from vector a, if |ai|<m, and then get a'i by operating sparse recovery from ai. at end, choose i from a'i. why the last result i is the sample we want to get.

venantius commented 8 years ago

To be honest with you I read the paper three years ago and can't remember now 😝

juanplopes commented 6 years ago

@thao6626 Imagine you have an arbitrary vector a = (a_1, ..., a_n), the goal of l0-sampling is to uniformly partition a in subvectors a(j) where, for each subvector, only some of the entries are the same as the original a_i, and the other are zeros. Each a(j) has a different number of original entries. a(1) has 1/2 of original entries, a(2) has 1/4, a(3) has 1/8, and so on. With high probability, some of those subvectors will be s-sparse (for some s chosen in function of the desired failure probability) and you'll be able to perform a recovery in it. By performing this recovery in the first a(j) possible, and choosing the index with minimum h(i) [where h is the same hash function you used to partition the vector in the first place], you're actually sampling supp(a) [there is some error, but it is negligible].

juanplopes commented 6 years ago

@venantius I just don't get why you specifically state that the updates must be non-negative. The Cormode paper says instead that each a_i must be non-negative (if you don't use the prime recovery). That is: there can be negative updates, but the final vector must have no negative values. This is a much more weaker (and useful) restriction.