madmurphy / libconfini

Yet another INI parser
https://madmurphy.github.io/libconfini/
GNU General Public License v3.0
164 stars 23 forks source link

Standalone manager program? #12

Open mcondarelli opened 4 years ago

mcondarelli commented 4 years ago

Is there, or is it planned somehow, a standalone program to handle ini-files from the shell? I am looking for something like crudini but without the python overhead.

If this is not available/planned I could volunteer to write such a thing, if there's interest.

madmurphy commented 4 years ago

Hi Mauro!

It is not planned, but I have definitely thought more than once about how it could look. First, I think that the best would be if it emulates the syntax of dconf/gsettings. So where we currently have

gsettings set org.gnome.desktop.interface clock-show-date true

we would have

ourprogram somfile.conf set path.to.a.section key value

There would be a dictionary of formats for known INI files. In case a format is not known it is possible to exploit libconfini's IniFormatNum data type and explicitly give a format number after a colon:

ourprogram somfile.conf:56637 set path.to.a.section key value

where 56637 is the format number.

I think internally our program would map the entire INI file in a tree of linked lists, so that it would be very simple to add/remove any value from any point.

It would be fantastic if you started to work on such a program. Currently libconfini does not have any function for writing to an INI file, but that might appear in the future. However for now you would be on your own in the process of saving the file.

Let me know how your plans evolve!

--madmurphy

mcondarelli commented 4 years ago

Uhm... I was thinking about something different, mainly to be used in a shell script. My current thoughts are along the lines:

inimanage --ini=whatever.ini --set foo:bar=baz --set foo:baz=bar
mybar=$(inimanage --ini=whatever.ini --get foo:bar)
mybaz=$(inimanage --ini=whatever.ini --get foo:baz)

or even:

eval  $(inimanage --ini=whatever.ini --get foo:bar --get foo:baz)

I didn't realize libconfini has (currently) no way of writing out whatever is in memory. I'll have to see if this can be implemented easily

madmurphy commented 4 years ago

Exporting what is in memory to a file is way simpler than the other way around. It is a planned feature. However it will not appear with the next release (the next release is planned to implement scientific notation while parsing numbers and a couple of new features), but hopefully it will appear within the next months.

A program for importing an INI file to the shell would be quite simple to implement. This could be a starting scheme:

#include <stdio.h>
#include <confini.h>

static int callback (IniDispatch * disp, void * v_other) {

  if (disp->type != INI_KEY) {

    return 0;

  }

  ini_string_parse(disp->value, disp->format);
  ini_string_parse(disp->data, disp->format);

  /*  You would need to sanitize `disp->data` as a valid shell variable name here...  */

  printf("export %s='%s'\n", disp->data, disp->value);

  return 0;

}

int main (int argc, char *argv[]) {

  if (argc < 2) {

    printf("Usage: %s FILE\n", argv[0]);
    return 0;

  }

  if (load_ini_path(argv[1], INI_DEFAULT_FORMAT, NULL, callback, NULL)) {

    return 1;

  }

  return 0;

}

As for the command line syntax, those were just my thoughts. Feel free to program as you like.

--madmurphy

mcondarelli commented 4 years ago

Thanks for the fast answer. Is the exporting to file available in git? I'm not afraid of using "unreleased" software ;)

madmurphy commented 4 years ago

No, it is not there yet, sorry. For now you will simply have to do this:

FILE *my_file = fopen("my_file.conf", "w+");
fprintf(my_file, "%s = %s\n", "key1", "value1");
fprintf(my_file, "%s = %s\n", "key2", "value2");
/*  etc. etc.  */
fclose(my_file);

which of course lacks quote support and sanitization. But it is something that can be used while waiting for the official function for exporting to a file. I can tell you already now that the future function will export a single node at a time. So it will look like this (I used a fantasy name for it):

FILE *my_file = fopen("my_file.conf", "w+");
write_with_bells_and_whistles(my_file, INI_SECTION, "some_section", NULL);
write_with_bells_and_whistles(my_file, INI_KEY, "key1", "value1");
write_with_bells_and_whistles(my_file, INI_DISABLED_KEY, "key2", "value2");
/*  etc. etc.  */
mcondarelli commented 4 years ago

FYI: I started coding the ini_tree backend. It is far from complete, but it can load and save some .ini files apparently without problems. Any comment welcome.

UPDATE: code deleted because new version available below.

madmurphy commented 4 years ago

Great Mauro! I will have a look at it as soon as I have a bit of time. From a fast check a problem is here:

        case INI_SECTION:
            ini_unquote(disp->data, disp->format);
            fprintf(stderr, "section [%s]\n", disp->data);
            break;

In order to work in all kinds of environments (including embedded systems), libconfini never allocates new memory (besides the first time, if ever), so all the strings dispatched are tokens from the same buffer. Editing the IniDispatch::data of a section is problematic, since that is the only case where a buffer is dispatched more than once (it will appear in all its children's ::append_to field) – there is no problem in editing the IniDispatch::data of a key or anything else instead.

The code above will work 99% of the times. However, imagine the following INI file:

["this"."is"."a"."quoted"."section"]
foo = bar

[.subsection]
hi = you

When foo is dispatched, the token "this"."is"."a"."quoted"."section" will be referenced in its IniDispatch::append_to, while when hi is dispatched, the string .subsection will be concatenated to "this"."is"."a"."quoted"."section" in order to form "this"."is"."a"."quoted"."section".subsection to be referenced in the latter's ::append_to.

If you unquote "this"."is"."a"."quoted"."section", you will transform the string to this.is.a.quoted.section\0\0\0\0\0\0\0\0\0\0. Of course this will not be a problem for you, since you will receive the new shorter length of the string. But libconfini does not know about your edit, neither wastes time to launch strlen() for each dispatch, since it already knows all the lengths, so it will concatenate this.is.a.quoted.section\0\0\0\0\0\0\0\0\0\0.subsection, making a truncated section path.

All this for saying that section paths need to be unquoted on a cloned buffer and you should not touch the buffer dispatched (in theory you could do transformations that do not change the length, like converting to upper / lower case, etc.).

Feel free to unquote the IniDispatch::data of a key instead, since that buffer is dispatched only once and gets lost immediately afterwards.

P.S. I have thought of the possibility of unquoting automatically before dispatching the section paths. But if I do that there will be no way of distinguishing between quoted dots (normal characters) and dots out of quotes (metacharacters). And the utility of using quotes in a section name – if it has any utility at all – is basically that of writing dots that are not metacharacters (think of INI files where IP addresses are used as section names, for example). So my solution has been instead that of establishing that the functions in libconfini that walk through delimited strings (e.g.: ini_array_foreach()) do not consider quoted delimiters as delimiters. So problem solved, but you will have to launch strndup() when you deal with INI_SECTION dispatches.

madmurphy commented 4 years ago

Where is the list.h header from? If it is from glibc it is missing on Arch Linux, therefore I cannot compile your program.

mcondarelli commented 4 years ago

Sorry, my bad.

List.h was lifted from Linux kernel itself with very minor tweaks needed to have it work in a userland program. You can find a pre-massaged copy (exactly the same thing I'm using) here.

Note: While I fully understand and appreciate your attempt to be as general as possible, including all corner cases, I feel the vast majority of potential users are interested in a (small) subset of available capabilities; wouldn't it be a bit easier/safer/faster to change "format"s to be a compile-time setting?

madmurphy commented 4 years ago

Perfect, I will try the list.h you posted.

If I move the format to a compile-time setting it will not be possible to use libconfini as a shared library (which compile-time setting would be the right one that will make everyone happy for being installed on a machine?), and having a shared library for parsing INI files is among the constituent goals of libconfini. If you look at the C libraries that use compile time settings heavily (inih, minini, etc.), you will find that they are rarely used as shared libraries, most of the times are used as static libraries.

But all parsing functions are consequent, so no matter what format you choose your program will do exactly the same steps, the IniFormat does not add complexity to the final user.

What is exactly that made you think to a compile-time format as a better option?

madmurphy commented 4 years ago

Ok, just tried it. Your draft seems working :+1:

I see that there is an unused function (new_dispatch()), I suppose for cloning a dispatch? There is an example under examples/utilities/clone_ini_dispatch.h for cloning an IniDispatch. I have updated the example in the meanwhile but I have not published it yet, so you could have a look at this new version if you want (you will not find it on GitHub) [EDIT: The GitHub version is the latest version, so you can take it directly from the package]:

/*  examples/utilities/clone_ini_dispatch.h  */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <confini.h>

/**

  @brief    Make a newly allocated hard copy of an `IniDispatch`
  @param    source          The `IniDispatch` to copy
  @param    nonconst_parent A placeholder for a non-`const` version of the
                            cloned `IniDispatch::append_to` pointer (it can be
                            `NULL`)
  @return   A hard copy of the `IniDispatch` passed as argument

  Example:

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.c}
  char * nonconst_parent;

  IniDispatch * my_clone = clone_ini_dispatch(
    dispatch,
    &nonconst_parent
  );

  if (!my_clone) {
    fprintf(stderr, "malloc() failed");
    exit(1);
  }

  printf("DATA: `%s`\n", my_clone->data);
  printf("VALUE: `%s`\n", my_clone->value);
  printf("PARENT: `%s`\n", nonconst_parent);
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  The address returned (and **only that**) must be freed afterwards.

**/
IniDispatch * clone_ini_dispatch (
  const IniDispatch * const source,
  char ** const nonconst_parent
) {

  IniDispatch * dest = malloc(sizeof(IniDispatch) + source->d_len + source->v_len + source->at_len + 3);

  if (dest) {

    char * _append_to_;
    memcpy(dest, source, sizeof(IniDispatch));
    _append_to_ = memcpy((char *) dest + sizeof(IniDispatch), source->append_to, source->at_len + 1);
    dest->append_to = _append_to_;
    dest->data = memcpy(_append_to_ + dest->at_len + 1, source->data, source->d_len + 1);
    dest->value = dest->data ? memcpy(dest->data + dest->d_len + 1, source->value, source->v_len) : dest->data + dest->d_len + 1;
    dest->value[dest->v_len] = '\0';

    if (nonconst_parent) {
      *nonconst_parent = _append_to_;
    }

  }

  return dest;

}
mcondarelli commented 4 years ago

I am now adding infrastructure to programmatically add nodes. I would like a comment on the tree structure as it's unclear to me what may be needed and changing it afterwards may be "inconvenient". Specifically I have only a foggy idea about how comments are handled. I'll need to handle them (now they are completely ignored). Also preserving key/comments order when updating might be not starightforward.

madmurphy commented 4 years ago

Ok, here my thoughts. If you want to preserve comments you need to append them to the tree too. libconfini has the peculiar ability to recognize disabled entries, and this can be exploited too (see below).

There are two types of comments dispatched by libconfini:

  1. INI_COMMENT
  2. INI_INLINE_COMMENT

Normally an INI_COMMENT refers to the next key/section that will be dispatched, while an INI_INLINE_COMMENT always refers to the previous key/section:

# This is an INI file

[global]
# any value is allowed here <-- this comment refers to the next node (`version`)
version = 1.2.1
foo = bar   # do not change this value <-- this comment refers to the previous node (`foo`)

# add whatever you want to this section
[optional]
#disabled_key = value
...

You have two possibilities for appending the comments to your tree:

  1. You append them in the same order you received them, exactly like you do with keys, on the same level (siblings). This will have the inconvenient that # add whatever you want in this section will be appended to [global], although it refers to [optional].
  2. You append block comments to the next node (you will need a queue for that) and inline comments to the previous node as children (but separated from the keys/subsections). This will have the inconvenient that the first comment of the document, which often is a header that refers to the entire document, will be appended to the first key/section (# This is an INI file in my example, which will be appended to [global]). If the document ends with a block comment you will treat the latter as a sort of “footer”.

Disabled nodes in this case come very much in handy, since you can treat them as normal keys/sections that have a disabled flag set to true. You will receive them as INI_DISABLED_KEY and INI_DISABLED_SECTION. Just append them as they come. You should also append as “disabled” possible non-disabled duplicate keys, so that when you save the file only one (the last one) will be saved as the actual key, but you will not loose the other ones, since they will be saved too, but as “commented out”.

Let me know if you have any questions.

mcondarelli commented 4 years ago

I am thinking about a mix:

  1. use strategy (1) (easier to implement).
  2. remember if comment is inline (INI_INLINE_COMMENT it will not have a '\n' prepended).
  3. append any key after other keys, but before any INI_COMMENT in section.
  4. remove all blank lines (I assume there's no way to to get them from the parser).
  5. insert a blank line before sections.
  6. possibly move blank line before INI_COMMENTs trailing previous section (I'm unsure about this).

Question: do you plan to handle in some way "array values"? something like: people = me, you, jane, jim or people = [me you jane jim]

I'll send a new version in the next days (hopefully). You are free to include it in Your code. I do not think starting a new project for this is a good idea.

madmurphy commented 4 years ago

I am not sure I understood your point 3. correctly. From what I understood you would transform the previous INI file into this:

[global]
version = 1.2.1
foo = bar   # do not change this value
# any value is allowed here
# add whatever you want to this section

[optional]
#disabled_key = value

I have just applied your rule “append any key after other keys, but before any INI_COMMENT in section”.

If that's the case you should better leave them in the exact order you received them.

As for the blank lines, there is no way to get them from the parser.

As for the arrays, definitely

people = me, you, jane, jim

Using square brackets would introduce syntax typing, and that's something that should absolutely be avoided in a configuration format. For more info about this, see here.

madmurphy commented 4 years ago

I think it will helpful if you had a look at this small program. It is still a work in progress to be put I just published it under examples/cplusplus. I think it can give you a clue about my approach to arrays.

mcondarelli commented 4 years ago

Here it comes current version of my program. It essentially has two distinct parts:

DOM doesn't handle arrays (yet). Of course load/dump cycle will incur in some reformatting, but I'm trying to keep those to a minimum, possibly with some meaningful default. A few questions for You:

`ini_tree.h`:

```c // // Created by mcon on 10/23/20. // #ifndef INIMANAGER_INI_TREE_H #define INIMANAGER_INI_TREE_H #include // list.h extensions START /** * list_next_entry - next entry list of given type * @pos: the type * of the current entry. * @member: the name of the list_struct within the struct. */ #define list_next_entry(pos, member) \ list_entry((pos)->member.next, typeof(*(pos)), member) /** * list_prev_entry - get previous entry in list of given type * @pos: the type * of the current entry. * @member: the name of the list_struct within the struct. */ #define list_prev_entry(pos, member) \ list_entry((pos)->member.prev, typeof(*(pos)), member) // list.h extensions END #define countof(array) (sizeof(array)/sizeof((array)[0])) #include "confini.h" #define INI_MAX_VALUE 1024 typedef enum ini_magic { DEAD = 0xdeadbeef, HEAD = 0xbeadfeed, NODE = 0xcafefeed, LEAF = 0xc0c0feed } ini_magic_t; typedef struct ini_node { enum ini_magic magic; IniDispatch node; struct list_head next; struct list_head subs; } ini_node_t; typedef enum { START, } print_status_t; typedef struct ini_path { uint64_t magic; struct list_head head; char *path; int dirty; } ini_path_t; int ini_set_format(const char *name); void *ini_parse_file(FILE *file); void *ini_parse_path(const char *path); void ini_trysave(void *opaque); void ini_saveto(void *opaque, const char *path); void ini_destroy(void *opaque); void ini_close(void *opaque); ini_node_t *add_key(void *opaque, IniDispatch * disp); ini_node_t *find_sect(void *opaque, const char *where); int ini_dump_file(void *opaque, FILE *file); int ini_dump_path(void *opaque, const char *path); int ini_dump(void *opaque); const char *ini_find(void *opaque, const char *what); int ini_set(void *opaque, const char *what, const char *value); //#define print_list(head) do{struct list_head *t; list_for_each(t,head){\ // ini_node_t *e=list_entry(t, ini_node_t, next);\ // fprintf(stderr,"***%s:%s***\n",__func__,e->node.data);\ // }}while(0) #endif //INIMANAGER_INI_TREE_H ```

`ini_tree.c`:

```c // // Created by mcon on 10/23/20. // #include #include #include #include #include "ini_tree.h" //#define DEBUG 1 #define TEST_MAGIC(pos, kind) if ((pos)->magic != (kind)) {\ errno = EINVAL;\ perror("bad magic");\ return -errno;\ } // known ini formats typedef struct { const char *name; IniFormat format; } format_t; static format_t known_formats[] = { {"default", { .delimiter_symbol = INI_EQUALS, .case_sensitive = false, .semicolon_marker = INI_IGNORE, .hash_marker = INI_IGNORE, .section_paths = INI_ABSOLUTE_AND_RELATIVE, .multiline_nodes = INI_MULTILINE_EVERYWHERE, .no_single_quotes = false, .no_double_quotes = false, .no_spaces_in_names = false, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = false }}, {"unix", { .delimiter_symbol = INI_ANY_SPACE, .case_sensitive = false, .semicolon_marker = INI_IGNORE, .hash_marker = INI_IGNORE, .section_paths = INI_ABSOLUTE_AND_RELATIVE, .multiline_nodes = INI_MULTILINE_EVERYWHERE, .no_single_quotes = false, .no_double_quotes = false, .no_spaces_in_names = false, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = false }}, {"fast", { .delimiter_symbol = INI_EQUALS, .case_sensitive = false, .semicolon_marker = INI_IGNORE, .hash_marker = INI_IGNORE, .section_paths = INI_ABSOLUTE_AND_RELATIVE, .multiline_nodes = INI_NO_MULTILINE, .no_single_quotes = false, .no_double_quotes = false, .no_spaces_in_names = false, .implicit_is_not_empty = false, .do_not_collapse_values = true, .preserve_empty_quotes = true, .disabled_after_space = false, .disabled_can_be_implicit = false }}, {"pacman", { .delimiter_symbol = INI_EQUALS, .case_sensitive = true, /* not sure! */ .semicolon_marker = INI_IS_NOT_A_MARKER, .hash_marker = INI_DISABLED_OR_COMMENT, .multiline_nodes = INI_NO_MULTILINE, .section_paths = INI_ABSOLUTE_ONLY, .no_single_quotes = true, .no_double_quotes = true, .no_spaces_in_names = false, .implicit_is_not_empty = true, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = true }}, {"mkinitcpio", { .delimiter_symbol = INI_EQUALS, .case_sensitive = true, /* not sure! */ .semicolon_marker = INI_IS_NOT_A_MARKER, .hash_marker = INI_DISABLED_OR_COMMENT, .multiline_nodes = INI_NO_MULTILINE, .section_paths = INI_NO_SECTIONS, .no_single_quotes = true, .no_double_quotes = false, .no_spaces_in_names = true, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = false }}, {"samba", { .delimiter_symbol = INI_EQUALS, .case_sensitive = false, /* not sure! */ .semicolon_marker = INI_DISABLED_OR_COMMENT, .hash_marker = INI_ONLY_COMMENT, .multiline_nodes = INI_NO_MULTILINE, .section_paths = INI_ABSOLUTE_ONLY, .no_single_quotes = true, .no_double_quotes = true, .no_spaces_in_names = false, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = true, .disabled_can_be_implicit = false }}, {"nsswitch", { .delimiter_symbol = INI_COLON, .case_sensitive = true, /* not sure! */ .semicolon_marker = INI_IS_NOT_A_MARKER, .hash_marker = INI_DISABLED_OR_COMMENT, .multiline_nodes = INI_NO_MULTILINE, .section_paths = INI_NO_SECTIONS, .no_single_quotes = true, .no_double_quotes = true, .no_spaces_in_names = true, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = false }}, {"windows", { .delimiter_symbol = INI_EQUALS, .case_sensitive = false, .semicolon_marker = INI_ONLY_COMMENT, .hash_marker = INI_IS_NOT_A_MARKER, .section_paths = INI_ABSOLUTE_ONLY, .multiline_nodes = INI_NO_MULTILINE, .no_single_quotes = false, .no_double_quotes = false, .no_spaces_in_names = false, .implicit_is_not_empty = false, .do_not_collapse_values = false, .preserve_empty_quotes = false, .disabled_after_space = false, .disabled_can_be_implicit = false }} }; #define dflt(a,b,c,d) d, static IniFormat current_format = {INIFORMAT_TABLE_AS(dflt)}; #undef dflt int ini_set_format(const char *name) { for (int i=0; imagic = DEAD; char *p = (char*)(sect + 1); strcpy(p, name); sect->node.type =INI_SECTION; sect->node.data = p; sect->node.d_len = n; INIT_LIST_HEAD(§->next); INIT_LIST_HEAD(§->subs); sect->magic = NODE; // FIXME: should we zero unused fields? list_add_tail(§->next, head); return sect; } static ini_node_t * new_leaf(struct list_head *head, enum IniNodeType type, const char *name, const char *value) { size_t n = strlen(name); size_t m = strlen(value); ini_node_t *entry = malloc(sizeof(ini_node_t) + n + m + 2); entry->magic = DEAD; entry->node.type = type; char *p = (char*)(entry + 1); strcpy(p, name); entry->node.data = p; entry->node.d_len = n; p += n+1; strcpy(p, value); entry->node.value = p; entry->node.v_len = m; INIT_LIST_HEAD(&entry->next); INIT_LIST_HEAD(&entry->subs); entry->magic = LEAF; // FIXME: should we zero unused fields? ini_node_t *sect = list_entry(head, ini_node_t , subs); if (sect->magic == NODE) { entry->node.append_to = sect->node.append_to; entry->node.at_len = sect->node.at_len; } list_add_tail(&entry->next, head); return entry; } ini_node_t * find_sect(void *opaque, const char *where) { ini_node_t *section = NULL; ini_path_t *ini = opaque; if (ini->magic == HEAD) { char *s = alloca(strlen(where) + 1); strcpy(s, where); struct list_head *head = &ini->head; while (s && *s) { char *t = strchr(s, '.'); if (t) *t++ = '\0'; ini_node_t *entry, *sect = NULL; list_for_each_entry(entry, head, next) { if (ini_string_match_si(s, entry->node.data, entry->node.format)) { sect = entry; break; } } if (!sect) { sect = new_sect(head, s); } head = §->subs; section = sect; s = t; } } return section; } static ini_node_t * dup_dispatch(struct IniDispatch *src) { ini_node_t * new = malloc(sizeof(ini_node_t) + src->d_len + src->v_len + src->at_len + 3); if (new) { memcpy(&new->node, src, sizeof(*src)); new->magic = DEAD; char *p = (char *) (new + 1); if (src->d_len) { new->node.data = memcpy(p, src->data, src->d_len + 1); p += src->d_len + 1; } if (src->at_len) { new->node.append_to = strncpy(p, src->append_to, src->at_len + 1); p += src->at_len + 1; } if (src->v_len) { new->node.value = strncpy(p, src->value, src->v_len + 1); } INIT_LIST_HEAD(&new->next); INIT_LIST_HEAD(&new->subs); new->magic = LEAF; } return new; } ini_node_t * add_key(void *opaque, IniDispatch * disp) { ini_node_t *sect = find_sect(opaque, disp->append_to); ini_node_t *new = dup_dispatch(disp); list_add_tail(&new->next, §->subs); return new; } ini_node_t * add_comment(ini_node_t *where, IniDispatch * disp) { ini_node_t *new = dup_dispatch(disp); list_add_tail(&new->next, &where->subs); return new; } ini_node_t * add_inline_comment(ini_node_t *where, IniDispatch * disp) { ini_node_t *new = dup_dispatch(disp); list_add_tail(&new->next, &where->subs); return new; } static int callback (IniDispatch * disp, void *data) { // static variables needed for comment handling static ini_node_t *current_sect = NULL; static ini_node_t *current_leaf = NULL; switch (disp->type) { case INI_UNKNOWN: fprintf(stderr, "unparsable input '%s' (should never happen)\n", disp->data); break; case INI_VALUE: fprintf(stderr, "value '%s' (should never happen)\n", disp->data); break; case INI_KEY: #ifdef DEBUG fprintf(stderr, "key %s=%s\n", disp->data, disp->value); #endif current_leaf = add_key(data, disp); break; case INI_SECTION: current_sect = find_sect(data, disp->data); #ifdef DEBUG fprintf(stderr, "section [%s]\n", disp->data); #endif break; case INI_COMMENT: add_comment(current_sect, disp); #ifdef DEBUG fprintf(stderr, "comment #%s\n", disp->data); #endif break; case INI_INLINE_COMMENT: add_inline_comment(current_leaf, disp); #ifdef DEBUG fprintf(stderr, "inline comment #%s\n", disp->data); #endif break; case INI_DISABLED_KEY: // TODO: implement #ifdef DEBUG fprintf(stderr, "disabled key #%s=%s\n", disp->data, disp->value); #endif break; case INI_DISABLED_SECTION: // TODO: implement #ifdef DEBUG fprintf(stderr, "disabled section [%s]\n", disp->data); #endif break; default: fprintf(stderr, "unknown type %d (should never happen)\n", disp->type); } return 0; } void * ini_parse_file(FILE *file) { ini_path_t *head = malloc(sizeof(ini_path_t)+1024); head->magic = HEAD; INIT_LIST_HEAD(&head->head); head->path = NULL; head->dirty = 0; load_ini_file(file, current_format, NULL, callback, head); return head; } void * ini_parse_path(const char *path) { FILE *f = fopen(path, "r"); if (f) { ini_path_t *head = ini_parse_file(f); char * p = (char *)(head+1); strcpy(p, path); head->path = p; return head; } return NULL; } void ini_destroy(void *opaque) { ini_path_t *head = opaque; if (head->magic == HEAD) { // TODO: free recursively head->magic = DEAD; free(opaque); } else { errno = EINVAL; perror("ini_destroy()"); } } void ini_close(void *opaque) { if (opaque) { ini_path_t *head = opaque; if (head->magic != HEAD) { errno = EINVAL; perror("ini_destroy()"); return; } ini_destroy(opaque); } } void ini_trysave(void *opaque) { if (opaque) { ini_path_t *head = opaque; if (head->magic != HEAD) { errno = EINVAL; perror("ini_trysave()"); return; } if (head->dirty) { ini_dump(opaque); } } } void ini_saveto(void *opaque, const char *path) { ini_path_t *head = opaque; if (!opaque || (head->magic != HEAD)) { errno = EINVAL; perror("ini_saveto()"); return; } if (!strcmp(path, "stdout")) ini_dump_file(opaque, stdout); else if (!strcmp(path, "stderr")) ini_dump_file(opaque, stderr); else { FILE *f = fopen(path, "w"); if (!f) { perror("ini_saveto()"); return; } ini_dump_file(opaque, f); } } //============================================================================= typedef struct emitter { char *buf; size_t siz; char *dst; } emitter_t; static emitter_t * emitter_init(size_t siz) { emitter_t *e = malloc(sizeof(emitter_t)); e->buf = malloc(siz); e->siz = siz; e->dst = e->buf+1; return e; } static void emitter_emit(emitter_t *e, char c) { if ((e->dst - e->buf) < 1) { char *t = e->buf; size_t n = e->siz + e->siz/2; e->buf = realloc(e->buf, n); if (!e->buf) { abort(); } e->siz = n; e->dst = e->buf + (e->dst - t); } *e->dst++ = c; } static char * emitter_close(emitter_t *e, char quote) { if (quote) { *e->buf = quote; emitter_emit(e, quote); emitter_emit(e, '\0'); } else { emitter_emit(e, '\0'); memmove(e->buf, e->buf +1, e->dst-e->buf); } char *s = e->buf; free(e); return s; } static char * escape_string(const char *src) { emitter_t *e = emitter_init(strlen(src) + 5); int needs_quotes = 0; while (*src) { if (*src == '\\') { emitter_emit(e, '\\'); emitter_emit(e, *src++); needs_quotes = 1; } else if (isprint(*src)) { if (isspace(*src)) needs_quotes = 1; emitter_emit(e, *src++); } else { char t[5]; switch(*src) { case '\n': emitter_emit(e, '\\'); emitter_emit(e, 'n'); break; case '\r': emitter_emit(e, '\\'); emitter_emit(e, 'r'); break; case '\t': emitter_emit(e, '\\'); emitter_emit(e, 't'); break; case '\0': emitter_emit(e, '\\'); emitter_emit(e, '0'); break; default: sprintf(t, "\\x%02x", *src); for (char *p = t; *p; p++) emitter_emit(e, *p); } src++; needs_quotes = 1; } } return emitter_close(e, needs_quotes? '"': 0); } typedef enum { INIT, SECT, KEY, COMMENT } dump_status_t; #pragma clang diagnostic push #pragma ide diagnostic ignored "misc-no-recursion" static int dump_file(FILE *file, struct list_head *head, dump_status_t status, dump_status_t prev) { ini_node_t *pos; dump_status_t curr = INIT; list_for_each_entry(pos, head, next) { switch (pos->node.type) { case INI_KEY: TEST_MAGIC(pos, LEAF) fprintf(file,"\n%s", pos->node.data); if (pos->node.v_len > 0) fprintf(file," = %s", pos->node.value); if (dump_file(file, &pos->subs, KEY, curr)) return -errno; curr = KEY; break; case INI_SECTION: TEST_MAGIC(pos, NODE) if (prev == INIT && status != INIT) fputc('\n', file); fprintf(file,"\n[%s]", pos->node.data); if (dump_file(file, &pos->subs, SECT, curr)) return -errno; curr = SECT; break; case INI_COMMENT: TEST_MAGIC(pos, LEAF) fprintf(file,"\n#%s", pos->node.data); // FIXME: should we recurse somehow? curr = COMMENT; break; case INI_INLINE_COMMENT: TEST_MAGIC(pos, LEAF) fprintf(file,"\t#%s", pos->node.data); curr = COMMENT; break; case INI_DISABLED_KEY: TEST_MAGIC(pos, LEAF) fprintf(file,"\n#%s", pos->node.data); if (pos->node.v_len > 0) fprintf(file," = %s", pos->node.value); if (dump_file(file, &pos->subs, KEY, curr)) return -errno; curr = KEY; break; case INI_DISABLED_SECTION: TEST_MAGIC(pos, NODE) if (prev == INIT && status != INIT) fputc('\n', file); fprintf(file,"\n#[%s]", pos->node.data); if (dump_file(file, &pos->subs, SECT, curr)) return -errno; curr = SECT; break; case INI_UNKNOWN: case INI_VALUE: default: errno = EINVAL; perror("dump_file(UNEXPECTED)"); return -errno; } } if (status == INIT) fputc('\n', file); errno = 0; return 0; } #pragma clang diagnostic pop int ini_dump_file(void *opaque, FILE *file) { ini_path_t *head = opaque; if (head->magic != HEAD) { errno = EINVAL; perror("ini_dump_file()"); return -errno; } return dump_file(file, &head->head, INIT, INIT); } int ini_dump_path(void *opaque, const char *path) { ini_path_t *head = opaque; if (head->magic != HEAD) { errno = EINVAL; perror("ini_dump_path()"); return -errno; } FILE *f = fopen(path, "w"); if (!f) { perror("ini_saveto()"); return -errno; } int ret = dump_file(f, &head->head, INIT, INIT); fclose(f); return ret; } int ini_dump(void *opaque) { ini_path_t *head = opaque; if (head->magic != HEAD) { errno = EINVAL; perror("ini_dump()"); return -1; } if (!*head->path) { errno = ENOENT; perror("ini_dump()"); return -1; } return ini_dump_path(opaque, head->path); } //============================================================================= // TODO: remove duplicated code const char * ini_find(void *opaque, const char *what) { static char buff[INI_MAX_VALUE+1]; // TODO: check buffer overflow strcpy(buff, what); char *key = strrchr(buff, ':'); if (!key) { errno = EINVAL; perror("ini_find(no colon)"); return NULL; } *key++ = '\0'; ini_node_t *sect = find_sect(opaque, buff); if (!sect) { errno = ENOENT; perror("ini_find(section not found)"); return NULL; } ini_node_t *entry; list_for_each_entry(entry, §->subs, next) { if ((entry->node.type == INI_KEY) && ini_string_match_si(key, entry->node.data, entry->node.format)) { memcpy(buff, entry->node.value, entry->node.v_len); buff[entry->node.v_len] = '\0'; ini_string_parse(buff, entry->node.format); errno = 0; return buff; } } errno = ENOENT; perror("ini_find(key not found)"); return NULL; } int ini_set(void *opaque, const char *what, const char *value) { static char buff[INI_MAX_VALUE+1]; // TODO: check buffer overflow strcpy(buff, what); char *key = strrchr(buff, ':'); if (!key) { errno = EINVAL; perror("ini_find(no colon)"); return -errno; } *key++ = '\0'; ini_node_t *sect = find_sect(opaque, buff); if (!sect) { errno = ENOENT; perror("ini_find(section not found)"); return -errno; } ((ini_path_t *)opaque)->dirty = 1; // opaque was tested by find_sect() ini_node_t *entry; list_for_each_entry(entry, §->subs, next) { if ((entry->node.type == INI_KEY) && ini_string_match_si(key, entry->node.data, entry->node.format)) { if (!ini_string_match_si(value, entry->node.value, entry->node.format)) { ini_node_t *next = list_next_entry(entry, next); list_del(&entry->next); free(entry); new_leaf(&next->next, INI_KEY, key, value); } return 0; } } new_leaf(§->subs, INI_KEY, key, value); return 0; } //============================================================================= ```

`main.c`:

```c #include #include #include #include #include "ini_tree.h" #if 0 int main0 (int argc, char *argv[]) { int changed = 0; if (argc < 2) { printf("Usage: %s FILE\n", argv[0]); return 0; } void *opaque = ini_parse_path(argv[1]); if (!opaque) { return 1; } if (changed) ini_dump(opaque); ini_dump_file(opaque, stderr); ini_destroy(opaque); return 0; } #endif int autosave_flag = 0; int verbose_flag = 0; int main (int argc, char **argv) { static struct option long_options[] = { {"autosave" , no_argument, &autosave_flag, 1}, {"verbose" , no_argument, &verbose_flag, 1}, {"iniformat", required_argument, 0, 'F'}, {"ini" , required_argument, 0, 'i'}, {"get" , required_argument, 0, 'g'}, {"set" , required_argument, 0, 's'}, {"write" , optional_argument, 0, 'w'}, {0, 0, 0, 0} }; void *ini = NULL; while (1) { /* getopt_long stores the option index here. */ int option_index = 0; int c = getopt_long (argc, argv, ":i:g:s:w", long_options, &option_index); /* Detect the end of the options. */ if (c == -1) break; switch (c) { case 0: /* If this option set a flag, do nothing else now. */ if (long_options[option_index].flag != 0) break; if (verbose_flag) { printf("option %s", long_options[option_index].name); if (optarg) printf(" with arg %s", optarg); printf("\n"); } break; case 'F': if (verbose_flag) printf ("option -F with value `%s'\n", optarg); ini_set_format(optarg); break; case 'i': if (verbose_flag) printf ("option -i with value `%s'\n", optarg); if (ini) { if (autosave_flag) ini_trysave(ini); ini_close(ini); } ini = ini_parse_path(optarg); if (ini==NULL) { fprintf(stderr, "cannot parse file: %s\n", optarg); return -1 ; } break; case 'g': if (verbose_flag) printf ("option -g with value `%s'\n", optarg); const char *s = ini_find(ini, optarg); printf("%25s: [%s]\n", optarg, s ? s : "UNDEF"); break; case 's': if (verbose_flag) printf ("option -s with value `%s'\n", optarg); char *v = strchr(optarg, '='); if (v) { *v++ = '\0'; ini_set(ini, optarg, v); } break; case 'w': if (verbose_flag) printf ("option -w (%s)\n", optarg? optarg: ""); if (optarg) ini_saveto(ini, optarg); else ini_trysave(ini); break; case '?': /* getopt_long already printed an error message. */ break; default: abort (); } } /* Print any remaining command line arguments (not options). */ if (optind < argc) { printf ("non-option ARGV-elements: "); while (optind < argc) printf ("%s ", argv[optind++]); putchar ('\n'); } if (autosave_flag) ini_trysave(ini); ini_close(ini); exit (0); } //-i /home/mcon/vocore/__V2__/prove/libconfini/examples/ini_files/log.ini --get Settings:AutoConnect --set Settings:AutoFoo=maybe --set Settings:AutoConnect=false --set Settings:AutoFie=true --write ```

madmurphy commented 4 years ago

That is wonderful, Mauro! I thought you had given up!

I am on a tight schedule right now and things are going to get worse in the next days, so it might take a bit before I test your work. As for your questions…

is there any way to detect blank lines in input, so I can preserve them?
Unfortunately not. The library first tokenizes each node by replacing all `/\n\r?[\v\t\f ]*|\r\n?[\v\t\f ]*/` found with `\0`, then dispatches each node. I have decided to do it in two steps so that the number of nodes is known before the dispatching begins (see `IniStatistics::members`). In this way you can always allocate in advance the memory that you need to store all the nodes dispatched in custom containers. Without it you could allocate a buffer in advance but you would have to allocate your custom containers with each dispatch. Also by doing it in two steps I can add further checks to the parsing. However the fact that it is in two steps makes it impossible to count the new lines, since these have already gone by the time you receive a dispatch. And counting how many `NUL` bytes precede a node would also not work, since a line could have contained spaces (only), and a line break could have been expressed by CR+LF, all things that have been transformed to `\0` during the first step. I suggest that your program claims its personality by forcing senseful double `\n` when it feels it's right (for example before a section) and loses the double `\n` in the original file. If you really want the line number there is a costly workaround that requires that I add a `IniStatistics::cache` pointer to my library (no problems with that, I had already thought of adding it) and you use it for allocating and `memcpy`ing the entire buffer (i.e. the entire file). You can then subtract `IniDispatch::data` - `IniStatistics::cache` to get the offset of the node in the buffer and check in your copy how many line breaks precede that offset (for that you could use the `urtrim_s()` private function from `confini.c`, with only a counter added for your purpose, which will be the new returned value). It will add some complication to your program.
I am thinking about binding comment line preceding a section header with the section (and thus move them with it, if the case); would that make sense?
Yes, it does make sense. But remember that you will have the problem of the document's header that will be appended to the first section (`# This is an INI file` in [my previous example][1]). There are no solutions for this, except that of introducing something new in the INI file world: (at least) three asterisks. ``` ini # myconfig.ini # # Please edit this file carefully # # * * * # Global options [global] ... ``` We have just established a way to separate the document's header from the comment that will be appended to `[global]` (i.e. `# Global options`). You will be the first human being to implement this syntax in INI files. Since it concerns only comments it will not be a revolution. Personally I like it.
in case both above makes sense it could be possible to clump comments with following section iff no blank line is in between; does it make sense?
Besides the fact there is no easy way to count the blank lines that precede a node, this would not bring much. It is very rare that a comment does not refer to the next node, with the only exception of the document's header. You can try to invent INI files where this happens, and you will see that it will always be possible to re-arrange the comment according to the “comment-first-then-node” pattern. This is due to the fact that the node order does not matter in INI files.
would it make sense to enforce indentation of keys/subsections?
Personally I find it ugly in INI files, but libconfini deals perfectly fine with it, so do it as you prefer.
what would be the preferred format for subsections? [sect][sect.sub][sect.sub.sub] or [sect][.sub][.sub]
They are really synonyms. You could make that an option settable by the user from the command line.

It might also please you that I have started to sketch my ini_fwrite_node() function. In order to support arrays and section paths containing dots, this new function always takes an argc and an argv as main arguments:

int ini_fwrite_node (
    const size_t argc,
    const char * const * const argv,
    enum IniNodeType type,
    FILE * file,
    IniFormat format,
    ConfiniWOptions options
);

The behavior is the following:

ConfiniWOptions is a new bitfield containing some write options, including the delimiter in case of arrays.

In the end writing to a file will look more or less like this:

FILE * my_file = fopen("my_file.conf" , "wb");

ini_fwrite_node(4, (const char * []) { "section", "subsection", "subsubsection", "etcetc" }, INI_SECTION, my_file, my_fmt, my_opts);
ini_fwrite_node(2, (const char * []) { "some_key", "single array member a.k.a simple string" }, INI_DISABLED_KEY, my_file, my_fmt, my_opts);
ini_fwrite_node(4, (const char * []) { "some_key", "array member #1", "array member #2", "etc. etc." }, INI_KEY, my_file, my_fmt, my_opts);

...

fclose(my_file);

If this approach might seem a bit verbose for most cases (since most of the times all that is needed is to write a key with a simple value which does not contain an array), I have already planned to have a wrapper for this simple case:

int ini_fwrite_keyval (
    const char * const key,
    const char * const value,
    bool is_disabled,
    FILE * file,
    IniFormat format,
    ConfiniWOptions options
) {

    return ini_fwrite_node(2, (const char * []) { key, value }, is_disabled ? INI_DISABLED_KEY : INI_KEY, file, format, options);

}

This wrapper can then be invoked as:

ini_fwrite_keyval("some_key", "some_value", false, my_file, my_fmt, my_opts);

But as I said, I am on a tight schedule and it might really take ages before I publish it. Don't wait for it.

mcondarelli commented 4 years ago

If I read correctly Your code it should be possible to insert a marker (LF? or just a second '\0'? blanks could be skipped with __LSHIFT__++ without actually zeroing chars) when a second LF is found while still skipping whitespace (there is the complication of multiple line-endings across platforms, but that is already handled with the __EOL_N__ ^= 1 trick). This would allow to call the callback function with a INI_DIVIDER IniDispatch.type leaving to the caller option to use it or not. Would You consider a patch for this?

I am also curious about why You try to outsmart compiler about variable allocation in registers.

madmurphy commented 4 years ago

If I read correctly Your code it should be possible to insert a marker

Theoretically speaking it would be possible. The sequence \n\n is made of two characters, so one of them could theoretically host a marker (most likely \n itself) instead of \0 – although I would have to instruct the second step to treat \0\n accordingly.

But for now I tend to disagree with the idea of using \n\n for distinguishing between a comment related to the next node from one that is not (you will have to convince me). Look at the header comments of these configuration files:

  1. https://git.samba.org/samba.git/?p=samba.git;a=blob_plain;f=examples/smb.conf.default;hb=HEAD
  2. https://build.opensuse.org/package/view_file/openSUSE:Factory/fuse/fuse.conf?expand=0
  3. https://github.com/intel/dleyna-server/blob/master/libdleyna/server/dleyna-server-service.conf.in
  4. https://github.com/LibreOffice/core/blob/master/vcl/unx/generic/printer/configuration/psprint.conf
  5. https://fossies.org/linux/sane-backends/backend/dll.conf.in

And in case of very long INI files people might use a “spaced” style:

# Some comment

[some_section]
foo = bar
...

# Some comment

[some_section]
foo = bar
...

# Some comment

[some_section]
foo = bar
...

These are all things about which there is a lot of freedom in the wild, and you should preserve that freedom. Your rule would be a bit arbitrary, and if you look at all these examples they all have one thing in common: comments refer to the next node, with the exception of the header comment.

This would allow to call the callback function with a INI_DIVIDER IniDispatch.type leaving to the caller option to use it or not. Would You consider a patch for this?

There will be several things that will break binary compatibility (for example, the IniFormat bitfield is currently 24bit long, i.e. 3 bytes, adding another option will make it 4 bytes long), therefore such patch can appear in version 2.0.0. But I would still suggest you that you think more in depth about it. It is not the first time that I deal with this question (how blank lines affect comment hierarchy), and I have come to the conclusion that there is only one common element between different INI files: comments refer to the next node, independently of blank lines, with the exception of the header.

I am also curious about why You try to outsmart compiler about variable allocation in registers.

Just pragmatism. I usually try the things, if I see that they improve the performance I keep them, otherwise I don't. So I did for the register keyword. Other things that I have tried but didn't work as expected are:

  1. Convert the access to strings from array to pointer arithmetic – currently most loops use string[idx] … idx++ instead of *string … string++: I thought pointers would be faster, but actually they slowed down the library, so I kept the arrays.
  2. Using the restrict keyword every time it was possible – surprisingly this also slowed down the code, so I didn't use it.
madmurphy commented 4 years ago

Ok, you will have to forgive me if I do this in pieces, but I have few free time at the moment. For now I just compiled your program and run it with your test case. I will do further checks later. First of all, good work Mauro! Besides the different parameter names (I would have chosen get instead of --get, etc.), you have designed it exactly as I would have. I have only few comments for now.

  1. This is of course only a matter of style. I see that you use the syntax section.subsection:key. It is a sort of common idiom to append the key to the section also using a dot, almost as if the key continued the section path (e.g., section.subsection.key). This is also the style I have chosen in examples/miscellanea/glib_hash_table.c and examples/cplusplus/map.cpp. But as I said, this is only a matter of style.
  2. If I launch your program with --get Settings:AutoConnect --get Settings:AutoFie I receive the following output:
         Settings:AutoConnect:   [false]
    Settings:AutoFie: [true]
    In my opinion you should just throw one value per line without any further text, as in:
    false
    true
    This will simplify a lot the program usage within shell scripting.
  3. As an extension to point 2, you could create a simple --print0 parameter that prints \0 instead of \n as delimiter between --get values
  4. For the very future, maybe it can be worth planning a --printf and --exec parameters in the same style of the GNU find program, but this would be a long term goal.
mcondarelli commented 4 years ago

Hi, I'm now on a delivery and thus I won't be able to work on this at least till next weekend (if I'm lucky).

1) I always prefer explicitly stating what I need, but this could be a parameter or a ENVIRONMENT setting (feel free to suggest a name)

2) I am planning a --fmt (or --format) setting to select printout style; I am thinking about (at least):

3) this could be just another --fmt zero

4) this is interesting, but, as you say, probably not for next week ;)

Notice I tried to split code in two distinct parts: 1) a "DOM like" interface to complement the "SAX like" interface You implemented. 2) a "manager" application using (1) to variously manipulating the .ini file.

All above pertains to (2), but I would like to nail down (1) first, in order to build on solid foundation. Please, if You find the time, review (1), with specific attention to interface (innards can always be changed) in order to freeze it in a way that could be useful to a more general audience than just myself and inimanager. You surely have better understanding of what could be useful.

madmurphy commented 4 years ago

Ok, still piece by piece in my spare time…

Under find_sect() you use a normal loop to iterate through section path parts:

while (s && *s) {
    char *t = strchr(s, '.');

    ...

    s = t;
}

This will make a mess in case of quoted section names and there is no need to do that. Section paths are collapsed INI arrays, and libconfini has six fantastic functions to deal with those (ini_array_get_length(), ini_array_foreach(), ini_array_break(), ini_array_release(), ini_array_shift(), ini_array_split()). The library's native functions are designed to deal with quotes and all the INI stuff. Remember that the manual is always your friend.

What you need here is ini_array_release():

ini_node_t *
find_sect(void *opaque, const char *where) {
    ini_node_t *section = NULL;
    ini_path_t *ini = opaque;
    if (ini->magic == HEAD) {
        char * s, * t = alloca(strlen(where) + 1);
        strcpy(t, where);
        struct list_head *head = &ini->head;
        while ((s = ini_array_release(&t, '.', INI_DEFAULT_FORMAT))) {
            ini_node_t *entry, *sect = NULL;
            list_for_each_entry(entry, head, next) {
                if (ini_string_match_si(s, entry->node.data, entry->node.format)) {
                    sect = entry;
                    break;
                }
            }
            if (!sect) {
                sect = new_sect(head, s);
            }
            head = &sect->subs;
            section = sect;
        }
    }
    return section;
}

As you can see I called ini_array_release(&t, '.', INI_DEFAULT_FORMAT). You will need to replace INI_DEFAULT_FORMAT with the current format used. Remember that with libconfini the format needs to be always carried around. But no worries, sizeof(IniFormat) is just 3 – less than an int.

P.S. I would not use the stack (alloca()) in case of user-given input. You never know what kind of garbage a user can pass to your program. I would just call strdup() / strndup() and then free().

Hear you in the next episode…

madmurphy commented 4 years ago

Here for another episode…

I don't know if you want your program to work also on Microsoft Windows, but if you do, you must always add the binary specifier to the FILE handles parsed by libconfini ("rb" instead of just "r"). Windows' idea of using CR+LF as line separators makes ftell() and fseek() behave funny if you use them in text mode (and libconfini uses both functions).

`FILE *f = fopen(path, "rb");`

(this is something you should remember in general when programming for Microsoft Windows)

Going back to the output format of your program… I think that well designing a program is as important as implementing it correctly. If you think about your --fmt=human/bare/eval/etc., they are all specific cases of a --printf argument.

One of the things I love the most about Unix is that it is so old that whatever problem you might face chances are that someone had it before you. If a program like find prints the bare output by default unless you specify a --printf parameter, it is quite likely that it is the correct approach. A simple print_with_format() function is not very hard to implement:

#include <stdio.h>
#include <string.h>

void print_with_format (
    const char * const fmtstr,
    const char * const section_path,
    const char * const key,
    const char * const value
) {
    const char * curr = fmtstr;
    for (const char * next = strchr(fmtstr, '%'); next; next = strchr(next, '%')) {
        if (next > curr) {
            printf("%.*s", (const int) (next - curr), curr);
        }
        switch (*++next) {
            case '%':
                putchar('%');
                break;
            case 'p':
                printf("%s", key);
                break;
            case 'P':
                printf("%s.%s", section_path, key);
                break;
            case 'v':
                printf("%s", value);
                break;

            /*  Add other cases here...  */
        }
        curr = ++next;
    }
    if (*curr) {
        printf("%s", curr);
    }
}

int main () {

    /*  Just any random key to format...  */
    const char * const section_path = "this.is.a.test";
    const char * const key = "foo";
    const char * const value = "bar";

    print_with_format("blablabla %P --> %v", section_path, key, value);

    return 0;

}

I just improvised with an eye on man find. The sequence %P is section+key, %p is the bare key, %v is the bare value, %% is a simple %. The switch case can be extended indefinitely. The function would need to be preceded by an unescape() function in order to deal with \n, \r, etc.

I might have given you something to think about…

madmurphy commented 4 years ago

Nearly forgot about it. Ten days ago I had published an “unescape” function under examples/utilities/ini_string_preparse.h. It was intended to be a “pre-parse” function to be invoked before ini_string_parse() in order to extend the escape sequences supported by the library, therefore it does not unescape a double backslash. But it is very easy to adapt it for our case.

Hence the code from the previous comment becomes:

#include <stdio.h>
#include <string.h>

/**

    @brief      Unescape escape sequences
    @param      str The string to unescape (it cannot be `NULL`)
    @return     The new length of the string

    Supported escape sequences are "\a", "\b", "\f", "\n", "\r",
    "\t", "\v", "\e".

**/
size_t strunescape (char * const str) {

    register char chr, * src = str, * dst = str;

    is_unesc: switch (chr = *src++) {

        case '\\': goto is_esc;
        case '\0': *dst = '\0'; return dst - str;
        default: *dst++ = chr; goto is_unesc;

    }

    is_esc: switch (chr = *src++) {

        case '\\': *dst++ = '\\'; goto is_unesc;
        case 'a': *dst++ = '\a'; goto is_unesc;
        case 'b': *dst++ = '\b'; goto is_unesc;
        case 'f': *dst++ = '\f'; goto is_unesc;
        case 'n': *dst++ = '\n'; goto is_unesc;
        case 'r': *dst++ = '\r'; goto is_unesc;
        case 't': *dst++ = '\t'; goto is_unesc;
        case 'v': *dst++ = '\v'; goto is_unesc;
        case 'e': *dst++ = '\x1B'; goto is_unesc;

        case '\0':

            *dst++ = '\\';
            *dst = chr;
            return dst - str;

        default:

            *dst++ = '\\';
            *dst++ = chr;
            goto is_unesc;

    }

}

void print_with_format (
    const char * const fmtstr,
    const char * const section_path,
    const char * const key,
    const char * const value
) {
    const char * curr = fmtstr;
    for (const char * next = strchr(fmtstr, '%'); next; next = strchr(next, '%')) {
        if (next > curr) {
            printf("%.*s", (const int) (next - curr), curr);
        }
        switch (*++next) {
            case '%':
                putchar('%');
                break;
            case 'p':
                printf("%s", key);
                break;
            case 'P':
                printf("%s.%s", section_path, key);
                break;
            case 'v':
                printf("%s", value);
                break;

            /*  Add other cases here...  */
        }
        curr = ++next;
    }
    if (*curr) {
        printf("%s", curr);
    }
}

int main () {

    /*  Just any random key to format...  */
    const char * const section_path = "this.is.a.test";
    const char * const key = "foo";
    const char * const value = "bar";

    char my_format_string[] = "blablabla\\t %P --> %v\\n";
    strunescape(my_format_string);
    print_with_format(my_format_string, section_path, key, value);

    return 0;

}

This code does already pretty much most of the job for a --printf parameter.

Going back to your tree of linked lists… Everything seems in order for now (some details can be adjusted in the future). Only a few comments:

  1. The dictionary of “known formats” could be transferred into a configuration file – an INI file, of course :-) As it would remain immutable in memory maybe this time I would map it using a hash table instead of a linked list (see examples/miscellanea/glib_hash_table.c for an example that uses glib – but feel free to adapt it to any other hash library or choose a completely different approach). Or maybe a simple if / else chain will do the job faster than an hash table, since an IniFormat has no more than 14 fields, and this will have the advantage that the values will be stored already with their native type (i.e. much less memory consumed). This can be however a later addition.
  2. You write “(should never happen)” in case of INI_UNKNOWN nodes. But INI_UNKNOWN nodes can happen, in case a node makes no sense (for example "" = foo). You will have to decide what to do with them.
  3. You can remove completely case INI_VALUE: that for sure will never happen.

The dictionary of known formats could be an INI file looking like this:

[format.default]
delimiter_symbol = "="
case_sensitive = off
semicolon_marker = disabled or comment
hash_marker = disabled or comment
section_paths = absolute and relative
multiline_nodes = multiline everywhere
no_single_quotes = off
no_double_quotes = off
no_spaces_in_names = off
implicit_is_not_empty = off
do_not_collapse_values = off
preserve_empty_quotes = off
disabled_after_space = off
disabled_can_be_implicit = off

[format.unix]
delimiter_symbol =
case_sensitive = off
semicolon_marker = disabled or comment
hash_marker = disabled or comment
section_paths = absolute and relative
multiline_nodes = multiline everywhere
no_single_quotes = off
no_double_quotes = off
no_spaces_in_names = off
implicit_is_not_empty = off
do_not_collapse_values = off
preserve_empty_quotes = off
disabled_after_space = off
disabled_can_be_implicit = off

## etc. etc.
XO39 commented 8 months ago

@madmurphy Any news on this yet?