mentat-is / gulp

g(ULP) - graphical universal log processor
https://gulp.sh
Other
20 stars 3 forks source link

mapping collisions / opensearch dynamic vs fixed mapping issues #22

Closed valerino closed 1 month ago

valerino commented 1 month ago

the issue

during data ingestion (both in ingestion and query plugins), at the moment we instruct the underlying opensearch to use dynamic mapping: this allows to add index mappings dynamically as data is added.

this works flawlessy until the ingested data remains consistent. when ingesting data from multiple sources (plugins), we may end up with the case of fields with same name BUT different type: this results in opensearch refusing the ingestion.

current approach

we already handle this by enforcing ECS in each ingestion plugin, and prefix with "gulp.unmapped" non-critical fields (or, simply, for which there is not a proper ECS naming): this allows to solve the issue, and avoid mapping collisions (at least in ingestion plugins).

but ....

in issue #1 we would like to remove sigma plugins (unless we decide to keep them as a "paid pluggable feature", which is unlikely imho): currently, we are forced to have specific code to convert sigma rules to use ECS fields to be correctly applied.

ingesting events as-is (only having a few fixed fields in GulpDocument, which we do map ECS anyay) instead, we wouldn't need to convert sigmas.... but we would run in the above mentioned issue of name/type clashing :(

here is where we share ideas about the issue!

valerino commented 1 month ago
valerino commented 1 month ago

use multiple indexes (one plugin -> n indexes):

from the opensearch perspective, passing multiple indexes to the API has 0 cost (just pass a csv index1,index2,index3,... instead of index).

for gulp, this would affect:

i expect this would also affect he UI in some way, @pinkrab @Mireg-V.

... but i think this would be the cleanest solution.

valerino commented 1 month ago

another approach would be, in query plugins, to manually map to ECS the interesting fields only and keep the others as gulp.unmapped.xxx strings as we do during ingestion.

this would be the easiest solution (no global/api intervention needed), but would imply knowledge of the data format on external sources to define the mappings to apply.

valerino commented 1 month ago

another approach would be, in query plugins, to manually map to ECS the interesting fields only and keep the others as gulp.unmapped.xxx strings as we do during ingestion.

this would be the easiest solution (no global/api intervention needed), but would imply knowledge of the data format on external sources to define the mappings to apply.

implemented in https://github.com/mentat-is/gulp/commit/147bb1044a29b105a6939e1bf937d31aa2b6ad87