specifysystems / sp_network

Server side code for syftorium
GNU General Public License v3.0
0 stars 0 forks source link

Create table(s) allowing query for uniqueness attributes for each dataset #85

Closed zzeppozz closed 5 months ago

zzeppozz commented 7 months ago

Start by comparing species and occurrence counts between all datasets. For each species in a dataset,

Create a dataframe/table with species rows and dataset columns, occ_count values Create on S3 with pandas, save in S3 (too big for redshift)

zzeppozz commented 7 months ago

Use a combination of pandas for stacked data and scipy.sparse for sparse aggregate data, download/upload to s3, read and write locally,

zzeppozz commented 5 months ago

Added queries, and tests to compare stacked data to sparse matrix