rsyslog / libfastjson

a fast json library for C
Other
52 stars 32 forks source link

Architecture for to-string conversion #119

Open EmielBruijntjes opened 7 years ago

EmielBruijntjes commented 7 years ago

Hi Rainer,

We have some issues with converting the json tree to a string from multiple threads at the same time. I think the problem is with the printbuf member that is being used by two threads. We will probably be able to work around this, but after looking at the code, I think it is a peculiar design that the output buffer is part of the json_object.

What would you think of adding the following functions to libfastjson:

/**
 * Signature of a custom print function that can be implemented by the user.
 * @param ptr user-supplied pointer
 * @param buffer buffer to be written
 * @param size size of the buffer
 * @return size_t number of bytes written
 */
typedef size_t (fjson_print_fn)(void *ptr, const char *buffer, size_t size);

/**
 * Write the json tree to a user-supplied function
 * @param jso the fjson_object instance
 * @param func the user supplied function to which all data will be passed
 * @param ptr user-supplied pointer that is passed as first param to the callback
 * @returns the number of bytes written
 */
extern size_t fjson_object_write(struct fjson_object *jso, fjson_print_fn *func, void *ptr);

/**
 * Extended write function that allows extra flags to be passed
 * @param jso the fjson_object instance
 * @param flags formatting options, see FJSON_TO_STRING_PRETTY and other constants
 * @param func the user supplied function to which all data will be passed
 * @param ptr user-supplied pointer that is passed as first param to the callback
 * @returns the number of bytes written
 */
extern size_t fjson_object_write_ex(struct fjson_object *jso, int flags, fjson_print_fn *func, void *ptr);

This would allow users to supply their own output function, for example to write the json to a FILE*, or to write it to a fixed size buffer on the stack or to a dynamically growing buffer just like the current "struct printbuf" does. In fact, the "printbuf" implementation could be implemented on top of these functions.

This will often also be a performance improvement because now we're always stuck with the dynamically growing printbuf -- even if we have a pretty good idea how big the output is going to be, or when we want to write to a file instead of to a dynamically allocated buffer.

Are you willing to accept a pull request that implements this?

davidelang commented 7 years ago

On Mon, 31 Oct 2016, Emiel Bruijntjes wrote:

We have some issues with converting the json tree to a string from multiple threads at the same time. I think the problem is with the printbuf member that is being used by two threads. We will probably be able to work around this, but after looking at the code, I think it is a peculiar design that the output buffer is part of the json_object.

double check if you are really having this problem with libfastjson. This is exactly what we were fighting with json-c that triggered the fork to create libfastjson. Your solution may be better than the fix that Rainer put in place, but you shouldn't be running into that problem with the current libfastjson

David Lang

What would you think of adding the following functions to libfastjson:

/**
* Signature of a custom print function that can be implemented by the user.
* @param ptr user-supplied pointer
* @param buffer buffer to be written
* @param size size of the buffer
* @return size_t number of bytes written
*/
typedef size_t (fjson_print_fn)(void *ptr, const char *buffer, size_t size);

/**
* Write the json tree to a user-supplied function
* @param jso the fjson_object instance
* @param func the user supplied function to which all data will be passed
* @param ptr user-supplied pointer that is passed as first param to the callback
* @returns the number of bytes written
*/
extern size_t fjson_object_write(struct fjson_object *jso, fjson_print_fn *func, void *ptr);

/**
* Extended write function that allows extra flags to be passed
* @param jso the fjson_object instance
* @param flags formatting options, see FJSON_TO_STRING_PRETTY and other constants
* @param func the user supplied function to which all data will be passed
* @param ptr user-supplied pointer that is passed as first param to the callback
* @returns the number of bytes written
*/
extern size_t fjson_object_write_ex(struct fjson_object *jso, int flags, fjson_print_fn *func, void *ptr);

This would allow users to supply their own output function, for example to write the json to a FILE*, or to write it to a fixed size buffer on the stack or to a dynamically growing buffer just like the current "struct printbuf" does. In fact, the "printbuf" implementation could be implemented on top of these functions.

This will often also be a performance improvement because now we're always stuck with the dynamically growing printbuf -- even if we have a pretty good idea how big the output is going to be, or when we want to write to a file instead of to a dynamically allocated buffer.

Are you willing to accept a pull request that implements this?

rgerhards commented 7 years ago

We inherited the printbuf thing from json-c.

@EmielBruijntjes you are right that this functionality is not thread safe. At least, it needs to be documented to not be so. The problem is that there are many aspects in the inherited code base that are not thread safe. A single json object should never be used on more than one thread.

@davidelang what we do in rsyslog is to duplicate the json object when running on different threads. json-c wasn't even able to support that correctly. Emiel knows very well, because he offered a patch to json-c which was not accepted ( https://github.com/json-c/json-c/issues/212 ). That was basically the last thing that made me fork json-c, where I applied Emiel's patch. [Just for reference and reminder, this was my motivation for the fork: http://blog.gerhards.net/2015/12/rsyslog-and-liblognorm-will-switch-to.html ]

I think Emiel's suggestion is the right path forward, and it will possibly solve the whole issue. This could even lead to bigger performance improvements in rsyslog as well. So I would definitely be interested in a patch and discussion around it. @EmielBruijntjes we just need to be sure we are in the same camp on our intentions, so that we both will be happy in the end. I suggest to go through the blog post above.

rgerhards commented 7 years ago

I should mention that I would also like to discuss about the custom print function. Right now, the data structure is very heavy on function pointers, and most of them are not really needed. To get to a faster implementation in the long term, I removed (a couple of month ago) the custom format option. I think this is the right thing to do. If @EmielBruijntjes really needs this, we need to think if we can somehow split the library in two. My ultimate goal is to have a very fast and small, but thus feature-limted, json lib. My main intent is not to just fix json-c and do the same under a better release schedule and with more community involvment (which is a noble way of doing things, but it is not what my main need is -- I really need speed).