k2v-academy / K2View-Academy

Other
0 stars 0 forks source link

Catalog - how data sample is being selected from the source? #1093

Closed yBqdo2VLaCdftea1MqgSdtEhrPZtV5oJRr4eIUo closed 2 months ago

yBqdo2VLaCdftea1MqgSdtEhrPZtV5oJRr4eIUo commented 2 months ago

Hello, we've got a question from our customer on how we select data sample for Catalog Discovery job?

For example if we have this in plugins.discovery:

"sample_size": {
    "percentage": 10,
    "min_size": 10,
    "max_size": 100
 },

how these 1000 rows selected? is it first 1000 rows, random, different per interface type, etc?

also if we have min_size: 10 and only 5 rows in DB - are we going to use 5 rows in this case?


Best regards, Andrey

tZajFGR0CidT8AVERBHw8puD36HY6oWViykmIIb commented 2 months ago

The data snapshot includes the first rows. If min size is 10 and only 5 rows are in the table, these 5 rows will be used.