jarun / googler

:mag: Google from the terminal
GNU General Public License v3.0
6.11k stars 529 forks source link

exception TrackedTextwrap: the impossible happened at offset nn of text "" #429

Closed lawrenceang74 closed 2 years ago

lawrenceang74 commented 3 years ago
> $ sudo curl -o /usr/local/bin/googler https://raw.githubusercontent.com/jarun/googler/master/googler && sudo chmod +x /usr/local/bin/googler
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed     
> 100  133k  100  133k    0     0   332k      0 --:--:-- --:--:-- --:--:--  332k

> (base) me@box:~ [1121-0532] ✅ 
> $ alias googler='LC_ALL="en_HK.UTF-8" googler'

> (base) me@box:~ [1121-0532] ✅ 
**> $ googler -d -l zh -C -n22 --np hello World | tee /tmp/out.txt**
> [DEBUG] googler version 4.3.2
> [DEBUG] Python version 3.7.4
> [DEBUG] Platform: Linux-5.11.0-40-generic-x86_64-with-debian-bullseye-sid
> [DEBUG] Connecting to new host www.google.com
> [DEBUG] Opened socket to 172.217.24.68:443
> [DEBUG] new_connection completed in 0.053s
> [DEBUG] Fetching URL /search?hl=zh&ie=UTF-8&num=22&oe=UTF-8&q=hello+World&sei=NBvpmsPLR_+VYsRMb0a20A
> [DEBUG] Cookie: 1P_JAR=2021-11-20-21
> [DEBUG] fetch_page completed in 0.590s
> [DEBUG] Response body written to '/tmp/googler-response-j2b7_j1u.html'.
> [DEBUG] parse completed in 0.349s
> 
> Traceback (most recent call last):
>   File "/usr/bin/googler", line 210, in __init__
>  1.  Hello World - 维基百科,自由的百科全书
>      https://zh.wikipedia.org/zh/Hello_World
>      Hello, World是指在電腦螢幕顯示「Hello, World!」(你好,世界!)字串的電腦程式。相關的程式通常
>      都是每種電腦編程語言最基本、最簡單的程序,也會用作示範一個編程 ...
> 
>  2.  "Hello, World!" program - Wikipedia
>      https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
>      A "Hello, World!" program generally is a computer program that outputs or displays the message "Hello, World!". Such a program is very simple in
>      most ...
> 
>  3.  hello world(程序代码)_百度百科
>      https://baike.baidu.com/item/hello%20world/85501
>      Hello World 中文意思是『你好,世界』。因为The C Programming Language
>      中使用它做为第一个演示程序,非常著名,所以后来的程序员在学习编程或进行设备调 试时延续了 ...
> 
>  4.  Hello World 聊天翻译软件- 首页
>      https://www.helloword.com.cn/
>      Hello World聊天翻译系统,专业聊天翻译技术,极速稳定收发,全球畅游,使用邮箱      
>      免费注册登录体验。专业翻译技术团队开发,超数百家企业信赖。
> 
>  5.  【Hello World.影評】虛擬世界喚醒平行時空夢中人 - 香港01
>      https://www.hk01.com/%E5%91%A8%E5%A0%B1/400434/hello-world-%E5%BD%B1%E8%A9%95-%E8%99%9B%E6%93%AC%E4%B8%96%E7%95%8C%E5%96%9A%E9%86%92%E5%B9%B3%E8%A1%8C%E6%99%82%E7%A9%BA%E5%A4%A2%E4%B8%AD%E4%BA%BA
>     assert text[offset : offset + len(line)] == line
> AssertionError
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/usr/bin/googler", line 3819, in <module>
>     main()
>   File "/usr/bin/googler", line 3804, in main
>     repl.display_results(json_output=opts.json)
>   File "/usr/bin/googler", line 2726, in enforced_method
>     method(self, *args, **kwargs)
>   File "/usr/bin/googler", line 2922, in display_results
>     r.print()
>   File "/usr/bin/googler", line 2669, in print
>     self._print_metadata_and_abstract(self.abstract, metadata=self.metadata, matches=self.matches)
>   File "/usr/bin/googler", line 2648, in _print_metadata_and_abstract
>     wrapped_abstract = TrackedTextwrap(abstract, fillwidth)
>   File "/usr/bin/googler", line 227, in __init__
>     offset, self._original
> RuntimeError: TrackedTextwrap: the impossible happened at offset 11 of text '2020 年2月17日 — '

https://gist.github.com/lawrenceang74/a45346c37184600bd0faef1cebb2a3e1

$ uname -a
Linux turing 5.11.0-40-generic #44-Ubuntu SMP Wed Oct 20 16:16:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(base) me@box:~ [1121-0540] ✅
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 21.04
Release:        21.04
Codename:       hirsute

$ python --version
Python 3.7.4

C:\Users\x>hyper version
3.1.4

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
`
[response.txt](https://github.com/jarun/googler/files/7575407/response.txt)
``
jarun commented 2 years ago

This seems more of a python textwrap module issue, at least they have to tell the right course here. Can you please check this with textwrap devs?

jarun commented 2 years ago

Adding @zmwangx