Open gregpawin opened 6 months ago
Utilizing a small sample, $order and ticket_number, I have implemented a function designed to integrate new ticket data with existing records. This function operates by first identifying the most recent ticket number in a current sample. It then fetches additional ticket records that have been issued since the last recorded ticket, ensuring no overlaps.
The integration process utilizes a predefined schema to standardize the incoming data, aligning new entries with the established data structure of the existing dataset. This schema specifies the expected fields and also sets default values for any missing data. Once the new data is fetched and normalized according to our schema, it is merged into the existing dataset, adding a specified number of new records — for example, the next five new tickets.
I'll try this function with an existing database this coming week.
Using the function, I increased the sample size. No duplicates so far. I began transferring the initial tickets into DuckDB and tesiting incrementally loading data into an in-memory DuckDB. I will do EDA on the data, check for duplicates, and give an update next week.
I updated the Merge function for adding more, distinct, parking tickets to an existing dataset. https://github.com/parcheesime/parking-tickets-app/blob/main/test_merge.ipynb @gregpawin please code review
Create function that uses the city data API to update a local database of parking citations. This will be used to update the citation database on a daily basis instead of downloading the whole database every time.