Closed gdrapp closed 1 year ago
Hey Greg, thanks for opening an issue. I am looking into this, will try and recreate the issue locally and see where the schema is mismatching.
This error is being caused by a bug in the way we handle the float
(float32) data type in Matano, since in VRL a float is a double (float64), and this causes schema mismatch issues if your type is defined as a float (float32) because its automatically upcasted during the transformation step.
I have a fix locally, that I have verified fixes your above example, that I will push out tomorrow to fix this. In the meantime you should be able to unblock yourself by typing this as a double
instead of a float
. Thanks
Appreciate you taking a look. I’m on vacation this week but will pull down your fix when I return and give it a whirl. Thanks!
https://github.com/apache/avro/commit/6eb72341912ac3858f56531e457de77213eafd02
My fix has been merged a fix in the official Avro upstream as well. Deleting and redeploying the DPMainStack using the latest release should have fixed any problems with float/double. Let me know if this resolves your issue, thanks!
Just attempted to test using today's (December 12th) nightly build but the DPMainStack won't deploy. Looks like this is some sort of bug because I was able to deploy on an older version. Tried deleting both the stacks and starting fresh but it didn't help, still getting same error. Not sure how to further debug this?
[100%] fail: The XML you provided was not well-formed or did not validate against our published schema
❌ DPMainStack (MatanoDPMainStack) failed: Error: Failed to publish one or more assets. See the error messages above for more information.
at publishAssets (/snapshot/node_modules/aws-cdk/lib/util/asset-publishing.ts:60:11)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at CloudFormationDeployments.publishStackAssets (/snapshot/node_modules/aws-cdk/lib/api/cloudformation-deployments.ts:572:7)
at CloudFormationDeployments.deployStack (/snapshot/node_modules/aws-cdk/lib/api/cloudformation-deployments.ts:419:7)
at deployStack2 (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:265:24)
at /snapshot/node_modules/aws-cdk/lib/deploy.ts:39:11
at run (/snapshot/node_modules/p-queue/dist/index.js:163:29)
❌ Deployment failed: Error: Stack Deployments Failed: Error: Failed to publish one or more assets. See the error messages above for more information.
at deployStacks (/snapshot/node_modules/aws-cdk/lib/deploy.ts:61:11)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at CdkToolkit.deploy (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:339:7)
at initCommandLine (/snapshot/node_modules/aws-cdk/lib/cli.ts:374:12)
Stack Deployments Failed: Error: Failed to publish one or more assets. See the error messages above for more information.
Hm thanks for the report. This seems to be caused by an obscure issue with the specific Node 16 version and compatibility with AWS CDK: https://github.com/aws/aws-cdk/issues/19287. We bundle node as part of the matano CLI using vercel/pkg, I'm going to look into recreating this issue and pinning the Node to a specific minor version as a potential fix. Will update here
I just found a post on SO that links to the same aws-cdk issue. They agreed rolling back the node version seems to fix it.
I also looked back at my debug deploy logs and I do see a multipart upload that has two null ETags, so definitely seems similar. I'll keep an eye out for a new Matano nightly. Thanks!
Can you run matano
and copy the version its outputting? Should look like:
VERSION
matano/0.0.0 linux-x64 node-v16.16.0
Also are you using Mac OS or Linux? Thx.
matano/0.0.0 darwin-x64 node-v16.16.0
macOS 12.6.1
I wasn't able to reproduce the error, I tried it on both Linux and MacOS but was able to successfully deploy.
I've went ahead and updated the embedded node version to 18.5.0. Could you try again and see if it works?
Not seeing any nightlies available for download right now. Can you check the build?
I've retried the build, they're generated now.
Same issue using matano/0.0.0 darwin-x64 node-v18.5.0. Can we try downgrading node to an earlier 16.x version?
Sure let me try to publish with 16.3.0, the linked issue mentions changing to that version fixed their issue.
I wasn't able to publish a binary with v16.3.0 for now but I've published a release that uses v14.18.1 which the linked post mentioned was confirmed to work. Can you try again and let me know?
The stack deploy debug no longer showed null etags for the multipart uploads, which is good, but it still failed to deploy the first two timed I tried, with this error:
⠹ Deploying Matano...[91%] fail: One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity tag may not match the part's entity tag.
The third time I ran it, the deployment was successful.
It seems like it's having trouble with the multipart uploads but if you run it a few times CDK eventually gets everything uploaded and it's happy. September/October I was experimenting with Matano and didn't have these issues, so it's strange that this just started happening with later builds (didn't have time to play with it in November).
Matano version - matano/0.0.0 darwin-x64 node-v14.18.1
Someone in the AWS CDK issue linked earlier mentioned they were running 16.7 and didn't see any issues, so it might be worth trying that version if 16.3 is giving you problems.
Good to see it works. Sure I'll check 16.7.
Btw are you on a stable internet connection, I would often get the new error you posted when I was on a bad internet connection which makes sense as its preventing a corrupt upload, especially for the larger assets.
Yeah, my internet is stable. I’ll continue to troubleshoot the CDK issue on my own and open another issue if necessary. I’ll close this issue because I was able to get the Okta float data through the Matano data pipieline, so I think we’re good.
I’d be happy to contribute my Okta work to the Matano project if you’re interested in making it a managed source, just let me know.
Thanks for your help and responsiveness solving this!
Definitely would appreciate the contribution of Okta as a managed log source. Feel free to create a PR, and thank you!
I'm creating a log source for Okta logs and am struggling to transform log data to the ECS fields client.geo.location.lat and client.geo.location.lon. With the VRL below, I consistently get the error "USER_ERROR: Failed at FindUnionVariant, likely schema issue." in the transformer Lambda. I have pretty much every other Okta log field working.
Looking at the ECS schema JSON, both lat and lon are defined as floats, so this should work.
Relevant VRL transform: .client.geo.location.lat = to_float(del(.json.client.geographicalContext.geolocation.lat)) ?? null .client.geo.location.lon = to_float(del(.json.client.geographicalContext.geolocation.lon)) ?? null
Relevant log data:
Any assistance identifying the issue or bug would be appreciated.
Thanks.