microsoft / pxt-microbit

A Blocks / JavaScript code editor for the micro:bit built on Microsoft MakeCode
https://makecode.microbit.org
Other
725 stars 597 forks source link

The text in some lessons is broken #5970

Open THEb0nny opened 1 month ago

THEb0nny commented 1 month ago

In some lessons the text is broken due to some characters and makecode interrupts the display of the line.

image image

THEb0nny commented 1 month ago

image

THEb0nny commented 1 month ago

It's worth checking out the other lessons too.

THEb0nny commented 1 month ago

image

THEb0nny commented 1 month ago

image image

abchatra commented 1 month ago

@ganicke is this a documentation issue?

ganicke commented 1 month ago

@abchatra - in a way, yes. Some of those icon type characters don't parse well when uploaded to Crowdin. Crowdin will terminate sentences early when encountered typically.

ganicke commented 1 month ago

@abchatra - So, I verified that the source arrives to Crowdin intact.

image

It's when presented in the editor that they truncate strings on certain special characters. In some languages the translators have fixed this by adding the icon chars back in their translation.

image

This seems to be a Crowdin issue. I could send them a bug report for this?

ganicke commented 1 month ago

Support message for this sent to Crowdin 10/16. Awaiting a response...

ganicke commented 1 month ago

@abchatra - So, I received a good response from Crowdin Support mentioning the possible use of segmentation rules to avoid breaks on the emoji/icon characters:

Hello there, 

For markdown, you can use custom segmentation rules:
https://support.crowdin.com/custom-segmentation/

We have plenty of possible custom modules (https://store.crowdin.com/tags/file-processors),
but changing a segmentation should solve this without much development work. 

In case it wouldn't help, please share with source file sample as an attachment to an email,
a screenshot of how it looks in Crowdin editor, and the project ID (or URL)

Thanks in advance, 
--
Sincerely,
Dima Yashchyshyn
Customer Success Manager

This does require, however, a segmentation (SRX) file added to support EACH source file needing custom segmentation. Otherwise, segmentation could be disabled on the source file and no strings would be parsed leaving the file as one blob text to translate in whole.

Creating an SRX file for these chars would add a new rule to NOT break (dice.md for example):

<rule break="no">
        <beforebreak>[🎲⭐👋]</beforebreak>
        <afterbreak>\s</afterbreak>
</rule>

This doesn't seem like a practical solution at this point. Not sure if modifying the default SRX is possible where we could set the whole range of these emojis to not break.?.?

abchatra commented 4 weeks ago

Thanks @ganicke for investigating this. @jwunderl @thsparks FYI