leopard-js / sb-edit

Javascript library for manipulating Scratch project files
MIT License
52 stars 15 forks source link

toLeopard: Inconsistency when renaming variables with illegal characters #120

Open ZXMushroom63 opened 10 months ago

ZXMushroom63 commented 10 months ago

While trying to convert the attatched project, among other things, the _.density and -.density variables were renamed to, density and density2.

However, when I was looking through the code in Engine.js, the RtImpulseGetDataOfObjectWithId function used variables that ended with a '2' along with variables that did not end in a number. This is not really an issue, but it makes trying to use the output of Leopard difficult.

The project: distilled-impulse.sb3.zip

towerofnix commented 10 months ago

I think we're just stripping illegal characters and then tacking on a 2 for the then-same-name second variable. But yeah, it has a pretty harsh effect on variables which were only differentiated by special characters, which is generally what that name deduplication tends to affect AFAIK.

I wouldn't be opposed to ensuring the first variable of a duplicate name gets a 1 at the end, i.e. density1, density2. Would be analogous to costumes and sprite names in Scratch (costume1, costume2, Sprite1, Sprite2).

Just going to /cc @PullJosh since this is a minor change in project translation, do you agree/disagree or have any additional nuance to add?

PullJosh commented 10 months ago

I definitely like your suggestion of appending a 1 to the first of the would-be-duplicate identifiers. That seems like a strict improvement over what we have now.

Brainstorming other ideas: just recently I translated a project that had a variable called クラウドランキング. Upon conversion to Leopard, that variable was renamed to _, which seems like clearly a bad choice. Since all our Scratch variables end up being keys on the vars object, could we instead use the original name like this.vars["クラウドランキング"]. For the sake of learning JS, I like the dot notation better because it's more conventional for everyday uses.

So here's an idea: 1) Delete white space in variable names and use camel case where possible 2) Delete a few special characters that are common in English (like apostrophes) 3) Use the resulting variable name. If it's not a valid JS identifier, use the this.vars[] notation. 4) Deal with duplicate identifiers the way @towerofnix described, starting from 1.

Thoughts?

towerofnix commented 10 months ago

I like it! Also pinging @adroitwhiz, not sure you're around but would be nice to get your feedback. We can build on this more later if there are other ideas present, but the outline above feels pretty good.

That said, it's worth noting that クラウドランキング is a valid identifier - in general (which is a somewhat big "in general"), JavaScript has no problem with Unicode characters in identifiers. this.vars.クラウドランキング = 24 is 100% valid JS. See also the presumably more recent reference on MDN, which shares a demonstratory regex: /[$_\p{ID_Start}][$\u200c\u200d\p{ID_Continue}]*/u. And the latest spec here.

I don't think there's a lot of advantage to using bracket notation when dot notation is (albeit surprisingly) functional.

I'd swap step 2 for a more specific "Strip characters that aren't valid in JavaScript identifiers", which would implicitly include apostrophes, exclamation marks, etc. There should be very few variable names that are entirely composed of invalid characters - stuff like -, --, ----- would get turned into this.vars._1, this.vars._2, this.vars._3, but that should be basically it.

PullJosh commented 10 months ago

Wow! I had no idea all of that was valid! Yeah, I love your proposal of stripping invalid characters but leaving everything else in place.

ZXMushroom63 commented 5 months ago

Another thing you could do is just alphabetically sort the variables before replacing invalid characters. That way, variable names are consistent.

towerofnix commented 5 months ago

D'you mean consistent between multiple times converting the same project? (Does the 1-2-3 variable suffix change order if you convert a project to Leopard, make some changes back in Scratch, and convert it to Leopard again?) Or something else?

edit: Or just to match the alphabetical order in Scratch? If so then yeah, that makes sense to me as something we could easily do, too (at least before addressing this more effectively!).

ZXMushroom63 commented 5 months ago

D'you mean consistent between multiple times converting the same project? (Does the 1-2-3 variable suffix change order if you convert a project to Leopard, make some changes back in Scratch, and convert it to Leopard again?) Or something else?

edit: Or just to match the alphabetical order in Scratch? If so then yeah, that makes sense to me as something we could easily do, too (at least before addressing this more effectively!).

I meant that the issue wouldn't be as bad if variables were sorted before replacing invalid characters and deduping. That way, if you had two variables differentiated by an invalid character, they will consistently have the same suffix.