psv-format / psv.c

This is a reference implementation of a Markdown to JSON converter, designed specifically for parsing Markdown tables into JSON objects. It allows for easy conversion of Markdown documents containing tables into structured JSON data. https://psv-format.github.io/
2 stars 0 forks source link

Should json keys be verbatim from table header? #6

Open mofosyne opened 4 months ago

mofosyne commented 4 months ago

After playing around with the table conversion. I'm now thinking it might not be a good idea to just copy over the json keys (e.g. ISBN (International Standard Book Number) becoming {" ISBN (International Standard Book Number)":2342323}) because of potential issue between json decoder/encoders.

Perhaps we want a function that would convert from "english" headers into a more computer friendly json keys?

Example Header JSON Key
Book Title book_title
Author author
Year of Publication year_of_publication
Genre genre
Number of Pages number_of_pages
ISBN (International Standard Book Number) isbn
Publisher Name publisher_name

Also refer to https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Property_Name_Format#Property_Name_Format for how google name their keys

mofosyne commented 4 months ago

Discussed with the AI a bit more. And this is the general javascript function that seems to be robust against most examples

function generateJSONKey(header) {
    // Remove content within parentheses
    let sanitizedHeader = header.replace(/\([^)]*\)/g, '');
    // Convert header to lowercase
    sanitizedHeader = sanitizedHeader.toLowerCase();
    // Replace special characters with underscores
    sanitizedHeader = sanitizedHeader.replace(/[^\w\s]/g, '_');
    // Replace spaces with underscores
    sanitizedHeader = sanitizedHeader.replace(/\s+/g, '_');
    // Remove consecutive underscores
    sanitizedHeader = sanitizedHeader.replace(/_+/g, '_');
    // Trim underscores from start and end of string
    sanitizedHeader = sanitizedHeader.replace(/^_+|_+$/g, '');
    // Ensure the key is not empty
    if (sanitizedHeader === '') {
        sanitizedHeader = 'unnamed';
    }
    return sanitizedHeader;
}

// Example headers
const exampleHeaders = [
    "Book Title",
    "Author (John Doe)",
    '"Publication" Year',
    "ISBN-13",
    "Number of Pages",
    "Publisher; Inc.",
    "Language (English)",
    "Edition #5",
    'Format (e.g., Paperback, "Hardcover")',
    'Price (in "USD")',
    "Average Rating",
    "Total Ratings",
    "Description",
    "Genre(s)",
    "First Name",
    "Last Name",
    "Email Address",
    "Phone Number",
    "Street Address",
    "City; State",
    "State/Province",
    "Postal Code",
    "Country",
    "Date of Birth",
    "Membership Type"
];

// Generate Markdown table
let markdownTable = "| Example Header | JSON Key |\n";
markdownTable += "|----------------|----------|\n";
exampleHeaders.forEach(header => {
    const jsonKey = generateJSONKey(header);
    markdownTable += `| ${header} | ${jsonKey} |\n`;
});

// Output Markdown table
console.log(markdownTable);
Example Header JSON Key
Book Title book_title
Author (John Doe) author
"Publication" Year publication_year
ISBN-13 isbn_13
Number of Pages number_of_pages
Publisher; Inc. publisher_inc
Language (English) language
Edition #5 edition_5
Format (e.g., Paperback, "Hardcover") format
Price (in "USD") price
Average Rating average_rating
Total Ratings total_ratings
Description description
Genre(s) genre
First Name first_name
Last Name last_name
Email Address email_address
Phone Number phone_number
Street Address street_address
City; State city_state
State/Province state_province
Postal Code postal_code
Country country
Date of Birth date_of_birth
Membership Type membership_type
mofosyne commented 4 months ago
#include <stdio.h>
#include <ctype.h>

// Function to generate JSON key from header
char* generateJSONKey(const char* header, char* jsonKeyBuffer, size_t bufferSize) {
    char* writePtr = jsonKeyBuffer;

    // Iterate through the header
    for (int i = 0; header[i] != '\0' && bufferSize > 1; i++) {
        char c = header[i];

        // Break if parentheses, brackets, or braces are found
        if (c == '(' || c == '[' || c == '{')
            break;

        // Convert to lowercase
        c = tolower(c);

        // Replace special characters and spaces with underscores
        if (!isalnum(c) && c != '_')
            c = '_';

        // Ensure only one underscore between words and that we don't start with underscore
        if (c == '_' && ((writePtr == NULL || writePtr[-1] == '_') || (writePtr == jsonKeyBuffer)))
            continue;

        *writePtr++ = c;
        bufferSize--;
    }

    // Remove trailing underscore, if any
    if (writePtr > 0 && *(writePtr - 1) == '_') {
        // Trailing _ found, end the string at the _
        *(writePtr - 1) = '\0';
    } else {
        // No trailing _ found, just end the string normally
        *writePtr = '\0';
    }

    return jsonKeyBuffer;
}

int main() {
    // Example headers
    const char* exampleHeaders[] = {
        "Book Title",
        "Author (John Doe)",
        "\"Publication\" Year",
        "ISBN-13",
        "Number of Pages",
        "Publisher; Inc.",
        "Language (English)",
        "Edition #5",
        "Format (e.g., Paperback, \"Hardcover\")",
        "Price (in \"USD\")",
        "Average Rating",
        "Total Ratings",
        "Description",
        "Genre(s)",
        "First Name",
        "Last Name",
        "Email Address",
        "Phone Number",
        "Street Address",
        "City; State",
        "State/Province",
        "Postal Code",
        "Country",
        "Date of Birth",
        "Membership Type"
    };

    // Output Markdown table
    char jsonKeyBuffer[256];
    printf("| Example Header                         | JSON Key           |\n");
    printf("|----------------------------------------|--------------------|\n");
    for (int i = 0; i < sizeof(exampleHeaders) / sizeof(exampleHeaders[0]); i++) {
        generateJSONKey(exampleHeaders[i], jsonKeyBuffer, sizeof(jsonKeyBuffer));
        printf("| %-38s | %-18s |\n", exampleHeaders[i], jsonKeyBuffer);
    }

    return 0;
}
Example Header JSON Key
Book Title book_title
Author (John Doe) author
"Publication" Year publication_year
ISBN-13 isbn_13
Number of Pages number_of_pages
Publisher; Inc. publisher_inc
Language (English) language
Edition #5 edition_5
Format (e.g., Paperback, "Hardcover") format
Price (in "USD") price
Average Rating average_rating
Total Ratings total_ratings
Description description
Genre(s) genre
First Name first_name
Last Name last_name
Email Address email_address
Phone Number phone_number
Street Address street_address
City; State city_state
State/Province state_province
Postal Code postal_code
Country country
Date of Birth date_of_birth
Membership Type membership_type