psv-format / psv.c

This is a reference implementation of a Markdown to JSON converter, designed specifically for parsing Markdown tables into JSON objects. It allows for easy conversion of Markdown documents containing tables into structured JSON data. https://psv-format.github.io/
2 stars 0 forks source link

colum attributes #5

Open mofosyne opened 4 months ago

mofosyne commented 4 months ago
{#dog}
| Name    | Age  {#age int} | City         |
| ------- | --- | ------------ |
| Alice   | 25  | New York     |
| Bob     | 32  | San Francisco|
| Bob     | 32  | Melbourne    |
| Charlie | 19  | London       |

Consider adding feature to add id annotation of column and datatype. This also allows for keyMapping mode

{
  "headers": ["Name", "Age", "City", "Datetime"],
  "keyMappings": {
    "Name": "Name",
    "Age": "Age",
    "City": "City",
    "Datetime": "Time of Creation"
  },
  "rows": [
    {"Name": "Alice", "Age": "25", "City": "New York", "Time of Creation": "2024-04-26T12:00:00"},
    {"Name": "Bob", "Age": "32.4", "City": "San Francisco", "Time of Creation": "2024-04-25T10:30:00"},
    {"Name": "Bob", "Age": "32", "City": "", "Time of Creation": "2024-04-24T15:45:00"},
    {"Name": "Charlie", "Age": "19", "City": "London", "Time of Creation": "2024-04-23T08:15:00"}
  ]
}
Crissov commented 4 months ago

Prior art has stuff like that in the “column setup row ”:

| Name    | Age | City         |
|:-------:| ---:|:------------ |
| Alice   | 25  | New York     |
mofosyne commented 4 months ago

Ah yeah, but that's typically for styling the left/right alignment. I'm more thinking of trying to resolve ambiguities during the conversion between psv tables to json (or CBOR with it's semantic tagging feature).

E.g. how do you know if 1714031846 is an integer or a string? Also how do you know if its UNIX time or not? One solution is to just assume everything is a string and tell users to deal with it later (and also to figure out the semantic on their end).

Source: https://stackoverflow.com/questions/65283208/toml-vs-yaml-vs-strictyaml Implicit typing causes surprise type changes. (e.g. put 3 where you previously had a string and it will magically turn into an int).

Anyway I hope you like what I got so far with psv.c the exercise has been quite illuminating on what we should consider when designing this format.


Another potential method perhaps... is for 'well known' field names, e.g. datetime is expected to be in utc only if the column header says 'date' at the end like 'creation date'. That would require a lookup table.

mofosyne commented 4 months ago

On further thoughts, since [] and {} don't typically appear in headers, as long as we mandate that markdown links are not allowed in headers we could leverage it to annotate datatype and attributes. So this [<datatype>] would annotate if a field is to be read as a int, string, etc... while {} would be the consistent attribute syntax, but where the .<class> would also be leveraged as a semantic annotation.

{#person}
| Name    | Age [int] {#age .age}  | City  [str] {#city .city} | Date [str] {#date .datetime} |
|:-------:| --------:|:------------ |:----------------------- |
| Alice   | 25       | New York     | 2022-04-20  |
| Bob     | 30       | Los Angeles  | 2022-04-21  |
| Eve     | 28       | Chicago      | 2022-04-22  |

It would in github fallback like so

{#person} Name Age [int] {#age .age} City [str] {#city .city} Date [str] {#date .datetime}
Alice 25 New York 2022-04-20
Bob 30 Los Angeles 2022-04-21
Eve 28 Chicago 2022-04-22