Closed zzeppozz closed 5 months ago
Start by comparing species and occurrence counts between all datasets. For each species in a dataset,
Create a dataframe/table with species rows and dataset columns, occ_count values Create on S3 with pandas, save in S3 (too big for redshift)
Use a combination of pandas for stacked data and scipy.sparse for sparse aggregate data, download/upload to s3, read and write locally,
Added queries, and tests to compare stacked data to sparse matrix
Start by comparing species and occurrence counts between all datasets. For each species in a dataset,
Create a dataframe/table with species rows and dataset columns, occ_count values Create on S3 with pandas, save in S3 (too big for redshift)