michaelrsweet / mxml

Tiny XML library.
https://www.msweet.org/mxml
Apache License 2.0
426 stars 157 forks source link

New feature suggestions #316

Closed zlifes closed 1 month ago

zlifes commented 1 month ago

MXML is great!

About 10 years ago, I ported MXML V2.7 on VxWorks 5.5, which was perfectly suited to handling XML files on embedded systems. One of the major porting changes was the realloc call in mxml_set_attr. Because realloc adds 1 _mxml_attr_t each time, the parsing process will get stuck here when the XML file is large. Changed to 4 or 8 _mxml_attr_t at a time and parsing works fine.

Recently I ported the latest MXML V4 version and I have a few requirements: 1, about the realloc tuning parameter settings, such as in the mxml_set_attr and mxmlIndexNew function calls. The parameters can be macro definitions or global variables. 2, add mxmlNodeIsBlank function, used to determine the blank node, in order to later clean up the document processing. 3, add the document to save the formatting settings, it is recommended that the Boolean switch parameters, as well as an indentation parameter. 4, in some functions, the stack allocation variable is very large, such as 8192 or 16384. this is a higher requirement for the task stack of the embedded system. It is suggested that it can be changed to a tuning parameter or other better ways.

Thank you very much.

michaelrsweet commented 1 month ago

@zlifes So if I can summarize what you are asking for:

  1. In mxml_set_attr, expand the attributes array by more than 1 element at a time to make the reallocation more efficient.
  2. Update all functions with large local character array variables to allocate them instead.

I have questions about the rest:

  1. WRT mxmlIndexNew, the current code allocates node pointers 64 at a time for efficiency. Are you asking for a different multiple or a way to configure the allocation increment at compile or run-time?
  2. WRT mxmlNodeIsBlank, how do you envision this working? Currently the only way to have a blank MXML_TYPE_TEXT node is to have whitespace between elements, otherwise the node is not blank.. And I'm not sure having a function just for this test is useful or necessary...
  3. WRT document save settings, what settings are you referring to? There are no "Boolean switch parameters", and indentation is a function of the whitespace callback.
zlifes commented 1 month ago

I apologise for the lack of clarity in the previous description.

  1. In the case of mxml_set_attr and mxmlIndexNew, or other cases where there is realloc-like allocation granularity, provide macro definitions to specify the number of increments per increment at compile time. Of course, it would be nice to provide runtime configuration parameters.

  2. In MXML_TYPE_TEXT mode, the text will be divided into multiple nodes according to the blank characters. In MXML_TYPE_OPAQUE mode, the text content is one node, which is easier to handle, but at the same time there are many blank nodes. I actually want to remove these blank nodes, which was the original purpose of mxmlNodeIsBlank. After loading the XML file, I can call mxmlNodeIsBlank to determine and remove blank nodes as needed.

  3. When saving the XML file, I need to write a callback function for formatting. I hope there can be a parameter to configure whether to format or not. At the same time there is another parameter to specify the number of indented characters formatting, such as mxmlSaveFd(mxml_node_t node, mxml_options_t options, int fd, bool isfmt, int indent).

  4. In some embedded systems such as VxWorks, the task stack is generally allocated smaller, such as 2 KB, so like 8192 or 16384 BYTE stack array will lead to task stack out of bounds. A possible way is to allocate memory on the heap, such as malloc.

michaelrsweet commented 1 month ago

I apologise for the lack of clarity in the previous description.

  1. In the case of mxml_set_attr and mxmlIndexNew, or other cases where there is realloc-like allocation granularity, provide macro definitions to specify the number of increments per increment at compile time. Of course, it would be nice to provide runtime configuration parameters.

I can certainly add a #define in mxml-private.h so you can tweak things. A run-time option is, IMHO, overkill. Tracking with issue #318.

  1. In MXML_TYPE_TEXT mode, the text will be divided into multiple nodes according to the blank characters. In MXML_TYPE_OPAQUE mode, the text content is one node, which is easier to handle, but at the same time there are many blank nodes. I actually want to remove these blank nodes, which was the original purpose of mxmlNodeIsBlank. After loading the XML file, I can call mxmlNodeIsBlank to determine and remove blank nodes as needed.

So the type callback can return MXML_TYPE_IGNORE to ignore the whitespace for specific child nodes, but I could look at adding a load option that eliminates blank values. Tracking with issue #317.

  1. When saving the XML file, I need to write a callback function for formatting. I hope there can be a parameter to configure whether to format or not. At the same time there is another parameter to specify the number of indented characters formatting, such as mxmlSaveFd(mxml_node_t node, mxml_options_t options, int fd, bool isfmt, int indent).

The whitespace callback returns a string to control indentation and is what you want.

  1. In some embedded systems such as VxWorks, the task stack is generally allocated smaller, such as 2 KB, so like 8192 or 16384 BYTE stack array will lead to task stack out of bounds. A possible way is to allocate memory on the heap, such as malloc.

Tracking with #319.

zlifes commented 1 month ago

Thanks again.

@.***

From: Michael R Sweet Date: 2024-03-30 02:11 To: michaelrsweet/mxml CC: zlifes; Mention Subject: Re: [michaelrsweet/mxml] New feature suggestions (Issue #316) I apologise for the lack of clarity in the previous description. In the case of mxml_set_attr and mxmlIndexNew, or other cases where there is realloc-like allocation granularity, provide macro definitions to specify the number of increments per increment at compile time. Of course, it would be nice to provide runtime configuration parameters. I can certainly add a #define in mxml-private.h so you can tweak things. A run-time option is, IMHO, overkill. Tracking with issue #318. In MXML_TYPE_TEXT mode, the text will be divided into multiple nodes according to the blank characters. In MXML_TYPE_OPAQUE mode, the text content is one node, which is easier to handle, but at the same time there are many blank nodes. I actually want to remove these blank nodes, which was the original purpose of mxmlNodeIsBlank. After loading the XML file, I can call mxmlNodeIsBlank to determine and remove blank nodes as needed. So the type callback can return MXML_TYPE_IGNORE to ignore the whitespace for specific child nodes, but I could look at adding a load option that eliminates blank values. Tracking with issue #317. When saving the XML file, I need to write a callback function for formatting. I hope there can be a parameter to configure whether to format or not. At the same time there is another parameter to specify the number of indented characters formatting, such as mxmlSaveFd(mxml_node_t node, mxml_options_t options, int fd, bool isfmt, int indent). The whitespace callback returns a string to control indentation and is what you want. In some embedded systems such as VxWorks, the task stack is generally allocated smaller, such as 2 KB, so like 8192 or 16384 BYTE stack array will lead to task stack out of bounds. A possible way is to allocate memory on the heap, such as malloc. Tracking with #319. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>