twisted / towncrier

Manage the release notes for your project.
https://towncrier.readthedocs.io
MIT License
790 stars 122 forks source link

🐛 Bug - Missing underline character for emojis #626

Open jacobgulan opened 3 months ago

jacobgulan commented 3 months ago

Whenever running the towncrier command for my release notes, the changelog.rst file that's generated for it is missing an underline for every emoji character that's used.

pyproject.toml setting looks as such:

[tool.towncrier]
directory = "CHANGES"
filename = "CHANGELOG.rst"
package = "mypackage"
title_format = "mypackage v{version} ({project_date})"
underlines = ["-", "~", "^"]

    [[tool.towncrier.type]]
    directory = "bugfix"
    name = "🐛 Bugfixes"
    showcontent = true

changelog.rst looks as such:


🐛 Bugfixes
~~~~~~~~~~
adiroiban commented 3 months ago

Many thanks Jacob for the report.

If you have time, please consider creating a PR with a fix for this.

I am happy to review and merge the fix. Let me know if you need any help.


It looks like this is an issue with jinja

From what I can see, the code generating the underlines is in the jinja template

https://github.com/twisted/towncrier/blob/a5a51b13d4c3ca60cb5d01ef56bd639071ff2f74/src/towncrier/templates/default.rst?plain=1#L19

adiroiban commented 3 months ago

Updated

I took a closer look.

The length for 🐛 Bugfixes is 10 and there are 10 underlines.

But I can see that RST fails to compile

https://rsted.info.ucl.ac.be/?theme=basic&n=946cb83fa878e973dd24ee1763b292e0

:(

jacobgulan commented 3 months ago

Think I might be seeing something now. When using the Python interpreter in my console I see the following output when using an emoji:

$ python
Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> len("�� Bugfixes")
10
>>> len("~~~~~~~~~~")
10
>>> len(" Bugfixes")
9

Looking like the issue lies with encoding. In the console it shows 11 characters, but Python3 using utf-8 is able to interpret 10 characters. Note that "��" = 🐛

adiroiban commented 3 months ago

I think that the issue is with docutils.

There is this upstream bug report https://sourceforge.net/p/docutils/bugs/335/

which suggest that docutils doesn't think that this is an issue :(

so it looks like we need to fix it in towncrier

On my Python 3.12 I can see that this is recognized as wide ... so docutils requires extra underlines

>>> unicodedata.east_asian_width('🐛')
'W'
# Other characters are narrow
>>> unicodedata.east_asian_width('a')
'Na'

I think that we might need a helper function / filter for jinja to calculate the underline size

adiroiban commented 3 months ago

One option is to add an extra underline_size member for each category, so that in the template, we can have something like this

{{ underline * definitions.category.underline_size }} 
jacobgulan commented 3 months ago

Opened PR #645 to resolve this