signal-noise / datasupply

Data Supply is an opinionated data processing, provision and presentation library of components for Javascript
MIT License
0 stars 0 forks source link

Refactor the existing dsv->json code and add CLI #18

Closed tomgp closed 3 years ago

tomgp commented 3 years ago

Based on feedback

Revisions:

Deferred:

Headlines

To see it working

Because of the change to the JSON output format this PR is a breaking change

The detail

This PR has a number of changes from the current version. Mostly under the hood to do with refactoring but also adding the supply command and adding a means to add metadata

marcelkornblum commented 3 years ago

I was on Node 12 without realising and got


> supply
(node:4709) ExperimentalWarning: The ESM module loader is experimental.
file:///home/marcel/projects/datasupply/src/supply-cmd.js:8
import { Command } from 'commander';
         ^^^^^^^
SyntaxError: The requested module 'commander' is expected to be of type CommonJS, which does not support named exports. CommonJS modules can be imported by importing the default export.
For example:
import pkg from 'commander';
const { Command } = pkg;
    at ModuleJob._instantiate (internal/modules/esm/module_job.js:97:21)
    at async ModuleJob.run (internal/modules/esm/module_job.js:136:20)
    at async Loader.import (internal/modules/esm/loader.js:179:24)```
marcelkornblum commented 3 years ago

Firstly, with now I'm running v14, it seems to work really nicely. Could we please find a way to test for the v12 error I found so we don't get passing tests for that situation?

I've got a few syntactical / naming type thoughts... :)

  1. I'm not a huge fan of supply-config.json - it feels more idiomatic to either use .datasupplyrc (my preference) or datasupply.config.json. Maybe we could allow a couple of different ones
  2. is the targetDirectories config key the relative location of the source data files? If so it feels misnamed from the point of view of the user
  3. IMO excludeDirectories would be better served with a distinct .datasupplyignore file (again idiomatically), perhaps optionally
  4. Although I understand the reasoning, I personally find YAML very easy to make errors in, and I wonder if we ought not to start with all config in JSON rather than using different formats for different types of config? I suggest it's called e.g. polling-data.metadata.json
  5. I think the data-specific config could do with a little more structure - it would be more flexible and future-proof if e.g. the fields were inside a fields object, with the title on a higher level object.
  6. Specifically for that Date format in the data config, the way it's a second leaf on the branch for that field looks odd to my structural eye :)
tomgp commented 3 years ago

Firstly, with now I'm running v14, it seems to work really nicely. Could we please find a way to test for the v12 error I found so we don't get passing tests for that situation?

Think I should be able to fix that (I thought I had )

I've got a few syntactical / naming type thoughts... :)

  1. I'm not a huge fan of supply-config.json - it feels more idiomatic to either use .datasupplyrc (my preference) or datasupply.config.json. Maybe we could allow a couple of different ones

Agree .datasupplyrc as the default is nice

  1. is the targetDirectories config key the relative location of the source data files? If so it feels misnamed from the point of view of the user

relative to the the location of the configuration file or in absence of that relative to the place where the command is invoked (I think, though let me check :) )

  1. IMO excludeDirectories would be better served with a distinct .datasupplyignore file (again idiomatically), perhaps optionally

I'm not sure about this, I like the idea of being able to have 2 different configs and of having as much config as possible in one place. we could also have an ignore file but then we'd need to to think about which took precedence

  1. Although I understand the reasoning, I personally find YAML very easy to make errors in, and I wonder if we ought not to start with all config in JSON rather than using different formats for different types of config? I suggest it's called e.g. polling-data.metadata.json

Agree, I've done it with YAML in the past because it works well when using things like Jekyll, with our typical stack json makes more sense

  1. I think the data-specific config could do with a little more structure - it would be more flexible and future-proof if e.g. the fields were inside a fields object, with the title on a higher level object.

this should be easier to do with JSON, I think there's a discussion to be had here about how prescriptive we want to be

  1. Specifically for that Date format in the data config, the way it's a second leaf on the branch for that field looks odd to my structural eye :)
marcelkornblum commented 3 years ago

Think I should be able to fix that (I thought I had )

I'm surprised the tests passed though - do you think they are missing something or was my setup at fault?