usdot-fhwa-stol / carma-analytics-fotda

CARMA Analytics is a secure, scalable and cloud agnostic data architecture to enable research analysis on big data sets by defining repeatable and scalable processes for ingesting data, creation of a data repository to fuse, process and perform quality assurance.
Apache License 2.0
3 stars 3 forks source link

Add kafka log parse utility classes/functions and write time sync kafka log parser #20

Closed paulbourelly999 closed 6 months ago

paulbourelly999 commented 7 months ago

PR Details

Description

Added python scripts to parse Kafka log data for SPAT and Time Sync messages into csv format.

Related GitHub Issue

Related Jira Key

CDAR-535

Motivation and Context

Data Analysis

How Has This Been Tested?

Local Testing

Types of changes

Checklist:

paulbourelly999 commented 6 months ago
  • functions and classes into separate files is unnecessary. They are all required for data parsing and are used only for this purpose (i.e., they are not library files).
  • Following the above point, the code can be combined into a single script, possibly called extract_kafka_msgs_from_logs.sh. It would look something like this:

Separating these improve modularity and reusability of parsing logic. My vision here is to have a python module that is responsible for parsing all carma streets kafka data regardless of use case. Then possibly have a dependent python module that is able to generate a lot of the common use case data analysis plots. Obscuring our implementation for data analysis presents very little value here to me, especially since this is no some script to be copied into 16 different common repos but more of a python package that will evolve with every data analysis

adamlm commented 6 months ago

Separating these improve modularity and reusability of parsing logic.

Combining the functions and classes into a single Python module does not reduce modularity. They are all related to extracting Kafka messages from logs, so it would make more sense to combine them into a single module.

My vision here is to have a python module that is responsible for parsing all carma streets kafka data regardless of use case. Then possibly have a dependent python module that is able to generate a lot of the common use case data analysis plots.

What you are describing here is similar to the analysis scripts for the Eclipse MOSAIC Real-Time Factor (see #19).

Obscuring our implementation for data analysis presents very little value here to me, especially since this is no [sic] some script to be copied into 16 different common repos but more of a python package that will evolve with every data analysis

Why would we have to copy this to several repos?

paulbourelly999 commented 6 months ago

@adamlm Please resolve any concerns that you feel have been addressed. I am unsure which of these comments are still relevant