k2view-academy / K2View-Academy

Other
15 stars 25 forks source link

Catalog - how data sample is being selected from the source? #1093

Closed avvakandr closed 3 months ago

avvakandr commented 3 months ago

Hello, we've got a question from our customer on how we select data sample for Catalog Discovery job?

For example if we have this in plugins.discovery:

"sample_size": {
    "percentage": 10,
    "min_size": 10,
    "max_size": 100
 },

how these 1000 rows selected? is it first 1000 rows, random, different per interface type, etc?

also if we have min_size: 10 and only 5 rows in DB - are we going to use 5 rows in this case?


Best regards, Andrey

natalylic commented 3 months ago

The data snapshot includes the first rows. If min size is 10 and only 5 rows are in the table, these 5 rows will be used.