Open wader opened 1 year ago
@wader:
instances: a: value: '"hello".substring(4,1)'
This is pretty much undefined behavior right now. As you can see in https://doc.kaitai.io/user_guide.html#str-methods, the str.substring()
method expects two arguments - from
and to
:
Method name Return type Description substring(from, to)
String Extracts a portion of a string between character at offset from
and character at offsetto - 1
(includingfrom
, excludingto
)
And it's implicitly assumed from <= to
(from == to
gives you an empty string ""
). The from > to
was unfortunately not thought of, so it's not very surprising to me that there are differences across target languages, because each language defines its own behavior in this case and KS doesn't do any attempt to standardize this so far.
But I agree with unifying this. The idea of KS is indeed that all parsers generated from a .ksy
spec should behave the same in all cases, and to achieve that, it's sometimes needed overcome the differences of the languages, sometimes by providing a custom implementation of certain operations in the runtime library (actually, this is one of the main goals of the runtime library, to provide a standard API regardless of the language specifics).
For substring(from, to)
in the case of from > to
, I think it makes sense to return an empty string ""
(as in the from == to
case).
This issue is quite similar in nature to https://github.com/kaitai-io/kaitai_struct/issues/746 - integer division also behaves differently across targets when the result is negative.
@wader Unrelated: GitHub has quite good syntax highlighting for code blocks, but you need to specify the language. For your comment here (https://github.com/kaitai-io/kaitai_struct/issues/1021#issue-1664302708), it would be ```ksy
(it has an entry in github/linguist, so it's recognized by GitHub out of the box and the .ksy
files on GitHub are also automatically highlighted as YAML thanks to that), ```console
and ```json
.
And it's implicitly assumed
from <= to
(from == to
gives you an empty string""
). Thefrom > to
was unfortunately not thought of, so it's not very surprising to me that there are differences across target languages, because each languages defines its own behavior in this case and KS doesn't do any attempt to standardize this so far.But I agree with unifying this. The idea of KS is indeed that all parsers generated from a
.ksy
spec should behave the same in all cases, and to achieve that, it's sometimes needed overcome the differences of the languages, sometimes by providing a custom implementation of certain operations in the runtime library (actually, this is one of the main goals of the runtime library, to provide a standard API regardless of the language specifics).For
substring(from, to)
in the case offrom > to
, I think it makes sense to return an empty string""
(as in thefrom == to
case).This issue is quite similar in nature to #746 - integer division also behaves differently across targets when the result is negative.
👍 yeah i think KS would benefit from having has few undefined behaviors as possible. I'm not sure how people usually use kaitai but maybe most generate to one language so don't notice differences much?
I also found a difference for <string>.to_i
when there is trailing garbage. If i remember correctly js just ignores but go and maybe some others fail. Should I create a new issue for that?
@wader Unrelated: GitHub has quite good syntax highlighting for code blocks, but you need to specify the language. For your comment here (#1021 (comment)), it would be
```ksy
(it has an entry in github/linguist, so it's recognized by GitHub out of the box and the.ksy
files on GitHub are also automatically highlighted as YAML thanks to that),```console
and```json
.
Aha didn't know there was ksy support, nice. Yeap i try to use highlighting but forgot sometimes, i actually added jq support to github linguist some time ago :)
def woho: 1+2;
@wader:
I'm not sure how people usually use kaitai but maybe most generate to one language so don't notice differences much?
Yes, I think so. All targets find their users, but most are only focused on one language (or possibly GraphViz + one programming language), so however the KS-generated parser in that language behaves, they think "that's how Kaitai works" I guess.
Which harms the idea of .ksy specs being language-agnostic of course, because other users may encounter issues when trying to use a .ksy spec in another language.
@wader:
I also found a difference for
<string>.to_i
when there is trailing garbage. (...) Should I create a new issue for that?
Yes, please, much appreciated ❤️
Hi, i noticed this difference while testing things:
ksdump (ruby):
webide (javascript):
Haven't looked deeper but i guess the behavior comes from how the JavaScript string
substring
method works.Is there a preferred kaitai behavior?