We need to develop a create_sample function that can generate sample data for testing, experimentation, or demonstration purposes. This function will generate synthetic data that closely mimics real data for various use cases.
AC:
Create a create_sample function for both Avro and Parquet util classes that can generate synthetic data based on a specified data schema or structure. Interface
def create_sample(schema_path: Path, n: Int):
...
Implement options to specify the number of records or rows to generate in the sample data.
Ensure that the generated sample data adheres to the provided data schema or structure.
Provide flexibility to generate both structured and semi-structured data, including support for nested structures if applicable.
It should support existing data sample files from the tests/data directory
Write unit and integration tests to validate the correctness of the function.
Integrate the function into the interface
Ensure the function is easy to use with a clear and well-documented API.
We need to develop a create_sample function that can generate sample data for testing, experimentation, or demonstration purposes. This function will generate synthetic data that closely mimics real data for various use cases.
AC: