Closed mozgwar closed 10 months ago
I did some test and it looks like the problem is in the _file_is_text function
@mozgwar Thanks for the information. All of those work for me on macOS. What is your OS? Could you share the output of the following command?
uname -a; file --version; bash --version
I'm on linux and also my main shell is zsh Linux jessica 6.1.38-cachyos #1 SMP PREEMPT Fri Jul 7 15:28:36 EDT 2023 x86_64 Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz GenuineIntel GNU/Linux file-5.45 magic file from /usr/share/misc/magic seccomp support included GNU bash, version 5.2.15(1)-release (x86_64-pc-linux-gnu) Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
zsh --version ✔ at 23:05:02 zsh 5.9 (x86_64-pc-linux-gnu)
Thanks. That looks normal so far. When you bookmark one of the URLs that doesn’t download, does nb
display the message, “Unable to download page at [url]”?
No, it just add the url to the bookmark file and stop there
@mozgwar Thanks. I've still been unable to reproduce or determine the cause of the issue. nb
uses curl
or wget
to download the page to a temp file and then reads the content from there. It sounds like the temp file is being created, but might not be getting recognized as a text file. The _file_is_text()
function is pretty simple and the only external program it uses is file
, which it looks like you have a recent version of. I'll have to keep thinking about this.
I just did: wget https://b-ark.ca/2020/04/22/diy-kindle-news.html and I get the following when I run file on it. 1) file diy-kindle-news.html diy-kindle-news.html: HTML document, Unicode text, UTF-8 text, with very long lines (1447)
2) file --exclude=apptype \ --exclude=encoding \ --exclude=tokens \ --exclude=cdf \ --exclude=compress \ --exclude=elf \ --exclude=tar \ -b --mime-type diy-kindle-news.html application/javascript
@mozgwar Does it save the title (first line of the bookmark file) as # Clang/Bootstrapping - Gentoo wiki (wiki.gentoo.org)
or # (wiki.gentoo.org)
?
no title just # (wiki.gentoo.org)
It fails to render any wikipedia page for me on linux :(
Here is a video of the issue : wiki.debian is rendered but any wikipedia link is not
https://github.com/xwmx/nb/assets/46517170/62d63c1f-6345-4bf0-b672-646cb81eb478
I just tried 2 wikipedia pages and I got the following results:
1) file ARM_Cortex-A76 --> ARM_Cortex-A76: HTML document, Unicode text, UTF-8 text, with very long lines (5793) file --exclude=apptype \ INT ✘ at 08:55:48 --exclude=encoding \ --exclude=tokens \ --exclude=cdf \ --exclude=compress \ --exclude=elf \ --exclude=tar ARM_Cortex-A76 ARM_Cortex-A76: JavaScript source, Unicode text, UTF-8 text, with very long lines (5793)
2) file Alaska_Day ---> Alaska_Day: HTML document, Unicode text, UTF-8 text, with very long lines (4067) file --exclude=apptype \ ✔ at 08:59:57 --exclude=encoding \ --exclude=tokens \ --exclude=cdf \ --exclude=compress \ --exclude=elf \ --exclude=tar Alaska_Day Alaska_Day: JavaScript source, Unicode text, UTF-8 text, with very long lines (4067)
I'm hitting the same issue, so I went ahead and removed the --exclude=encoding
flag and that fixed it for me.
Makes me think that this is maybe a bug with recent file
versions or maybe one of the extra libmagic database entries that some programs add is causing conflicts?
Just for completeness sake:
$ uname -a; file --version; bash --version
Linux remote-desktop-1 6.2.0-1016-gcp #18-Ubuntu SMP Fri Sep 22 16:23:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
file-5.44
magic file from /etc/magic:/usr/share/misc/magic
GNU bash, version 5.2.15(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
@leamsi Thanks for the info! I confirmed that this is an issue in file
that started at some point between version 5.41 and 5.44 and exists on both Arch and Ubuntu. These options are intended as a performance optimization. I don’t notice any difference in benchmarks at the moment, so I’ve removed it. This change is in the repo and will be in the next release version.
This should be fixed as of version 7.8.0. Let me know if you run into any more issues with it. Thanks!
Hi, I'm evaluating this nice software to see if it will fit my need and so far so good. but I noticed that some sites only bookmark the url. the following are not working:
the following works as expected :