openspending / cubepress

Flat-file generator for aggregated spending reports
MIT License
5 stars 2 forks source link

Documentation points to mention #7

Open stevage opened 8 years ago

stevage commented 8 years ago

Please don't take this as criticism :) But these are answers I was trying to find in the brief README that could be made explicit:

  1. Are the references in aggregates.json to fields in datapackage.json, or to the raw CSV?
  2. Hence, what is the value of "file": the name of the CSV file (minus extension), or the value of the resources/name element of the datapackage.json? (Or is it the resources/path?)
  3. Why does format need to be specified, if it's already in the datapackage.json?
  4. What is table, in the sample SQL query? Is it always the word table, or what does it refer to exactly?
  5. What do the column names (admin, executed) refer to exactly - resources/schema/fields/name?
  6. Do aggregates really not have names? I was expecting a structure like [byAdmin: { ... }, ... ] perhaps?
  7. What dialect of SQL is this? In particular, how should field names with spaces etc be referred to: "like this", [like this] ... ?
danfowler commented 8 years ago

@stevage thanks for the feedback! These are great questions and will help us come up with the next iteration. We're actually rethinking how aggregates.json would work. To your questions:

  1. Currently, the references in aggregates.json point to fields in the raw CSV. This may change as work evolves on the measures and dimensions mapping in the spec.
  2. file refers to the resulting output file for the computed aggregate CSV.
  3. format right now is useless as we're currently only generating CSVs, but that may change in the future if we decide to generate, say, JSON aggregates
  4. table is an implementation detail from when we were roughly experimenting with raw SQL, and will probably go away in favor of more structured queries (e.g. Sequelize-style queries)
  5. admin (example administrative category), executed (example stage in budget cycle), etc. are indeed column names in an example source CSV. They are entirely dependent on the type of data you have and are indeed described by the resources/schema/fields/name
  6. As of now, the name is more or less the file name, but your suggestion makes sense.
  7. This first iteration used Python's SQLAlchemy. Field names with spaces could be referred to using square brackets. We may be moving to Sequelize-style queries in which a field name with spaces can just be a normal JSON string in quotes.

I hope this makes sense :).