[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT

What is the bug? Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT which caused streaming job failure.

org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://kmf-zero-etl-

demo/AWSLogs/aws-account-id=****/aws-service=vpcflowlogs/aws-region=us-east-2/year=2024/month=05/day=25/hour=05/****_vpcflowlogs_us-east-2_fl-*****.log.parquet. Column: [protocol], Expected: bigint, Found: INT32
    at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:724)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:397)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:227)

What is the expected behavior? VPC SQL statement definition should match the original VPC specifications:

Protocol column is INT32 in VPC doc but Athena create table uses BIGINT which I think our integration refers to

Do you have any additional context? Add any other context about the problem.

opensearch-project / opensearch-catalog

[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167