To read in the extensive dataset into a Pandas dataframe that we can then Pickle, and stage for transformations and joining as required in Sprint 2. Also scope for changing to Polars and Lazy loading may be explored.
[x] Download all data to PSV files
[x] Summary statistics recorded for each dataframe
[x] Selected psvs that likely hold the housing data (in this case DEFAULT GEOCODES)
[x] Read into a dataframe (~16,000,000 rows)
[x] Added state tag to cross-check validity of the SA1 data mapping later on
To read in the extensive dataset into a Pandas dataframe that we can then Pickle, and stage for transformations and joining as required in Sprint 2. Also scope for changing to Polars and Lazy loading may be explored.