test-fullautomation / python-jsonpreprocessor

A preprocessor for json files
Apache License 2.0
2 stars 2 forks source link

Side effects of token string usage #289

Open HolQue opened 1 month ago

HolQue commented 1 month ago

Hi Son,

the following observations are not necessarily bugs that urgently need to be fixed. But there are side effects w.r.t. these token strings.

Please take this just for your information; maybe you can get some ideas out of it to improve your code.

The JsonPreprocessor defines some token strings that were added to the JSON content during computation:

COLONS           = "__handleColonsInLine__"
DUPLICATEDKEY_00 = "__handleDuplicatedKey__00"
DUPLICATEDKEY_01 = "__handleDuplicatedKey__"
STRINGCONVERT    = "__ConvertParameterToString__"
LISTINDEX        = "__IndexOfList__"
SLICEINDEX       = "__SlicingIndex__"
STRINGVALUE      = "__StringValueMake-up__"

It has already been observed that these token strings are not removed completely before the preprocessed JSON content is passed to the JSON interface. This leads into wrong results: https://github.com/test-fullautomation/python-jsonpreprocessor/issues/215

I asked myself what might happen in case of I use these token strings in JSON code immediately.

This is the result:

(1)

"testlist"         : ["A", "B", "C", "D"],
"__SlicingIndex__" : 1,
"param"            : ${testlist}[0:${__SlicingIndex__}]

Result:

Error: 'pop from empty list'!

(2)

"testlist"        : ["A", "B", "C", "D"],
"__IndexOfList__" : 1,
"param"           : ${testlist}[${__IndexOfList__}]

Result:

Error: 'local variable 'indexList' referenced before assignment'!

(3)

"__ConvertParameterToString__" : 1

Result:

{'': 1, '__ConvertParameterToString__': 1}

(4)

"__ConvertParameterToString__" : 1,
"param"                        : "${__ConvertParameterToString__}"

Result:

Error: ''NoneType' object is not subscriptable'!

(5)

"__handleColonsInLine__" : 1,
"param"                  : "${__handleColonsInLine__} : ${__handleColonsInLine__}"

Result:

Error: 'pop from empty list'!

(6)

"__handleDuplicatedKey__00" : 1

Result:

Error: ''''!

(7)

"__handleDuplicatedKey__00--A" : 1

Result:

Error: ''int' object has no attribute 'pop''!

(8)

"testdict" : {"__handleDuplicatedKey__00" : 1, "__handleDuplicatedKey__00" : 2}

Result:

Error: ''''!

(9)

"testdict" : {"__handleDuplicatedKey__00--A" : 1, "__handleDuplicatedKey__00--B" : 2}

Result:

Error: ''int' object has no attribute 'pop''!

It might be very improbable, that a user does this, but nevertheless I am a little bit worried about these observations.

In case of a user really uses exactly the same strings like the JsonPreprocessor internally, I would simply expect wrong results, but not such basic exceptions like 'int' object has no attribute 'pop'.

Any possibility to make the code more robust?

HolQue commented 1 month ago

Addendum:

Variables are not initialized properly.

indexList is under condition:

if re.search(indexPattern, line):
    indexList = re.findall(indexPattern, line)

Therefore you cannot be sure that this variable exist when you use it:

item = item.replace(CNameMangling.LISTINDEX.value, indexList.pop(0), 1)

HolQue commented 1 month ago

Hi Son,

another proposal: You should handle these token strings as reserved keywords. You implement a method to check if a string contains one of these keywords. You can use this method twice:

  1. At the beginning, to check the JSON content, to make sure that no user by mistake uses such a keyword inside his JSON files.
  2. At the end, after preprocessing, before the preprocessed JSON content is passed to the JSON interface. This is something like an internal check to make sure that all expressions are resolved properly and only clean content is passed to the JSON interface.