Open dhuang opened 4 years ago
Hi @dhuang,
That's a really good idea to explore and I need to admit we always took it as granted that all loaders should use same Analytics SDK transformation, but I still see several problems with this approach (although, I didn't try to get deeply into it yet):
At the same time, I also see two big benefits:
Snowflake has some capabilities when it comes to transforming during a load. From my very basic understanding of what the transformer does, it seems like much of its logic replaced with a transformation on load, which means we can also directly load data from the enriched files in S3.
Example test file in the enriched file format:
I was able to load this into Snowflake with each type of unstructured event in the right format.
Open questions
If this is indeed possible, there could potentially just be a single loader step without the transformer Spark job at all?