Closed xlucn closed 2 years ago
Fixing this would be a lot of work, as widths are calculated all over the place, including in most individual formatters.
Urwid does provide a native function to determine the width of a character:
from urwid.util import str_util
str_util.get_width(ord(u'x'))
So I'm sure fixing this is possible, but without a PR from somebody else, or sponsorship, we probably won't see any progress.
Thanks for the reply and pointing out the direction. I once created an issue in tremc with similar problem. But that was very complex work.
Maybe solving it for vit
will be easier, if the only requirement is to correctly calculate the width of some text? What I mean is the "columns" only display the text (AFAIK), so the major work is to find where the width of a string is calculated, and to return the correct width of it.
I did some quick research and test, urwid's calc_width function can do that already. So I tried some changes (see below), only involving task descriptions. And it works so far:
I am not an expert in Python, though. Is this a good solution?
diff --git a/vit/formatter/description_count.py b/vit/formatter/description_count.py
index f03ef9b..19dc63c 100644
--- a/vit/formatter/description_count.py
+++ b/vit/formatter/description_count.py
@@ -1,10 +1,11 @@
from vit.formatter.description import Description
+from vit.util import unicode_len
class DescriptionCount(Description):
def format(self, description, task):
if not description:
return self.empty()
- width = len(description)
+ width = unicode_len(description)
colorized_description = self.colorize_description(description)
if not task['annotations']:
return (width, colorized_description)
@@ -14,7 +15,7 @@ class DescriptionCount(Description):
def format_count(self, colorized_description, task):
count_string = self.format_annotation_count(task)
- return len(count_string), colorized_description + [(None, count_string)]
+ return unicode_len(count_string), colorized_description + [(None, count_string)]
def format_annotation_count(self, task):
return " [%d]" % len(task['annotations'])
diff --git a/vit/formatter/description_desc.py b/vit/formatter/description_desc.py
index 130661c..b3fdec5 100644
--- a/vit/formatter/description_desc.py
+++ b/vit/formatter/description_desc.py
@@ -1,8 +1,9 @@
from vit.formatter.description import Description
+from vit.util import unicode_len
class DescriptionDesc(Description):
def format(self, description, task):
if not description:
return self.empty()
colorized_description = self.colorize_description(description)
- return (len(description), colorized_description)
+ return (unicode_len(description), colorized_description)
diff --git a/vit/util.py b/vit/util.py
index 5fb0f51..acc2d61 100644
--- a/vit/util.py
+++ b/vit/util.py
@@ -4,6 +4,8 @@ import curses
import shlex
from functools import reduce
+from urwid.str_util import calc_width
+
curses.setupterm()
e3_seq = curses.tigetstr('E3') or b''
clear_screen_seq = curses.tigetstr('clear') or b''
@@ -55,3 +57,6 @@ def file_to_class_name(file_name):
def file_readable(filepath):
return os.path.isfile(filepath) and os.access(filepath, os.R_OK)
+
+def unicode_len(string):
+ return calc_width(string, 0, len(string))
As for the execution time, it's not that slow, shouldn't be noticeable for a screen-full of text:
In [31]: s = "中文"*500 # this should fill a screen of 80x25
In [32]: %timeit len(s) # this time doesn't increase with string length
40 ns ± 0.319 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [33]: %timeit str_util.calc_width(s, 0, len(s)) # but this increases linearly
8.12 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Seems to work on a quick test. I like the idea of the simple unicode_len()
wrapper as this will allow us to easily alter the underlying implementation should we need to, for e.g. performance enhancement.
I do have some concerns about performance, in which case I'm wondering if adding a config option to enable this type of calculation would be in order -- then the folks who need it can enable with a trade off in performance, and those who don't need it can leave it disabled. If we do decide this is a good approach, then we'd need to implement the wrapper a bit differently than a simple util
function, as it will need access to the config
object to determine its behavior.
As a quick reference, this is a list of files that use the len()
function somehow. Quite a few of those uses are to calculate string lengths, but not all:
vit/application.py
vit/autocomplete.py
vit/base_list_box.py
vit/color.py
vit/command_bar.py
vit/config_parser.py
vit/formatter_base.py
vit/formatter/depends_count.py
vit/formatter/description_count.py
vit/formatter/description_desc.py
vit/formatter/description.py
vit/formatter/description_truncated_count.py
vit/formatter/description_truncated.py
vit/formatter/__init__.py
vit/formatter/markers.py
vit/formatter/project_parent.py
vit/formatter/project.py
vit/formatter/tags_count.py
vit/formatter/tags.py
vit/formatter/uda_date.py
vit/formatter/uda_duration.py
vit/formatter/uda_indicator.py
vit/formatter/uda_numeric.py
vit/formatter/uda_string.py
vit/help.py
vit/key_cache.py
vit/list_batcher.py
vit/multi_widget.py
vit/process.py
vit/readline.py
vit/task_list.py
vit/task.py
vit/util.py
I do have some concerns about performance, in which case I'm wondering if adding a config option to enable this type of calculation would be in order -- then the folks who need it can enable with a trade off in performance, and those who don't need it can leave it disabled.
That makes sense if it affects performance badly. Here are more profiles of the performance. Basically, calculating width of one string of length ~10000, or one thousand short strings only take less than one millisecond (~100 µs). These could provide an estimate of the typical time usage if someone has 1000 tasks.
In []: %timeit len("test"*10)
32.4 ns ± 0.375 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In []: %timeit unicode_len("test"*10)
224 ns ± 4.14 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In []: %timeit unicode_len("中文"*10)
348 ns ± 2.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In []: %timeit unicode_len("中文"*10000)
162 µs ± 342 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
As a quick reference, this is a list of files that use the len() function somehow. Quite a few of those uses are to calculate string lengths, but not all:
Yes, those can be found by a simple grep
command. I had already changed quite some of them in my fork, but might not have covered them all.
I'm not sure that count is accurate. My all
report shows 4000 tasks, and I'm guessing the average 'length of all strings per task' would be pretty far above 10.
It would be helpful to know what columns will need this treatment, as clearly all will not.
The absolute value may be different from a real user case, but it's just a scale estimation. The relative time, though, can be estimated to be about 10 * time using len
.
More importantly, instead of looking at the absolute time, we should compare the time used by unicode_len
function and the total time loading/rendering a report.
I tested with 4000 tasks (with vit 2.2.0 and the taskwarrior-test.sh
script), the "Exec. time" reported in the header is 872 milliseconds (my computer has a AMD 4800H cpu and a fast NVMe SSD):
I am not sure if vit
counts any hidden columns, but calling unicode_len
on those shown texts only uses 3.35 milliseconds, slightly more than 10 times of that for len
, which is 0.24 milliseconds (I assume they are counted once):
In [13]: def test():
...: for i in range(4000):
...: unicode_len("一个项目名称")
...: unicode_len("测试中文组成的任务描述文字测试中文组成的任务描述文字")
...: unicode_len("5m")
...:
In [14]: %timeit test()
3.35 ms ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [15]: def test_len():
...: for i in range(4000):
...: len("一个项目名称")
...: len("测试中文组成的任务描述文字测试中文组成的任务描述文字")
...: len("5m")
...:
In [16]: %timeit test_len()
241 µs ± 924 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
So this function won't be affecting the total rendering time that much.
It would be helpful to know what columns will need this treatment, as clearly all will not.
Yes, that will help. Some number related columns might not need this. However, if any column (I am not sure) can be customized, then those can be any text, too.
I'm 99% confident that it only calculates len for fields displayed in the report.
Can you get back to me w/ a final list of the columns that will need this treatment?
I think the most obvious ones are description, project and tags, which are string-type columns accepting random user input.
I am not sure which other columns can be customized, like active.indicator
. However, users can avoid wide characters when changing them.
For now, vit does not support locales, so other columns like dates and pre-defined strings in my opinion are not affected. Besides those, numeric columns are free from this problem as well.
If somebody wants to work a PR up making the adjustments to those columns, I'll have a look at it. Just make sure you have a look at all the formatters for each column type.
Describe the bug When the project name or description contain wide characters, e.g., in CJK languages, the column width ends up about half what the text actually needs. That is, as if the text are only treated as half-width characters:
I am not sure if this was related to #142, if yes, then the issue was not completely solved.
To Reproduce I think the test case below can reproduce it. Tested with https://github.com/vit-project/vit/commit/e37a75da8ff0856b9b7efd5c226741d2892ee175
Expected behavior The columns reserve enough space for wide characters.
Test case If reproducing your issue requires any TaskWarrior setup, please produce a test case script.