stanford-crfm / ecosystem-graphs

255 stars 34 forks source link

Github action for collating all assets into a single CSV file #153

Closed buhrmann closed 6 months ago

buhrmann commented 7 months ago

Does some preprocessing so that values in a single column are type-consistent (no mixing of strings, floats or objects). Also homogenises missing data representation.

Can easily be extended in the future to write different outputs (json, parquet...), and to add other kinds of preprocessing (proper numeric representation of model sizes e.g.).

The collated CSV file will automatically be commited to resources/all_assets.csv.

Feel free to squash the merge, as I needed some intermediate commits to fix initial attempts at the github action.