ToDo List - Githubissues

jarun commented 7 years ago

Continued from #135.

Notes

The list below is a growing one. While suggesting new features please consider contributing to Buku. The code is intentionally kept simple and easy to understand with comments. We'll be happy to assist any new contributor. We need your help!

Some of the fresh-baked features may not have been released yet. Grab the master branch for those.

Identified tasks

[ ] Android app (using the same database) (probably a distinct project)
[x] Text-mode user agent for Buku
[x] Read default Firefox profile name from profiles.ini (see #212, thanks @alex-bender)
[x] Support --format in search results (ref, thanks @mosegontar)
[x] API documentation (comments need to be in NumPy format) (thanks @mosegontar)
[x] Auto-import: optionally add parent folder name as tag, ask for unique tag [like importdb()]
[x] Support custom colours (thanks @shv-q3)
[x] Generate packages on Travis-CI using PackageCore (see #189) (thanks @shaggytwodope)
[x] Search multiple tags, exclusion in tag search (thanks @mosegontar)
[x] Auto-import Firefox and Google Chrome bookmarks (thanks @alex-bender)
[x] Suggest tags those go together
[x] Append/overwrite/remove tags from prompt
[x] Rest API for webapps (thanks @kishore-narendran)
[x] Add more tests (ongoing activity @rachmadaniHaryono)
[x] A browser plugin (thanks @SamHH for bukubrow)
[x] Text editor support (thanks @ZwodahS)
[x] Need a PyPI maintainer (thanks @shaggytwodope)
[x] Make refreshdb faster using threads (record updates should be synchronized)
[x] Show usage count in tag list
[x] Proxy support (thanks @denisfalqueto)
[x] Continuous search at prompt
[x] Add prompt help
[x] Specify custom DB file to class BukuDb (library usage, no exposed option)
[x] Move to urllib3
[x] Handle redirects using referrer masking. Example URL. Fixed with urllib3.
[x] Support URL shortening. This helps to share URLs. (see #92 for limitations)
[x] Make a bookmark title immutable via refreshdb()
[x] Markdown import/export
[x] Regex search
[x] Ubuntu PPA (thanks @shaggytwodope)
[x] Export specific tags to HTML
[x] Exact word match using REGEX. Make substring match optional.
[x] Delete all records based on a search result
[x] Delete multiple items, support combination of indices and ranges
[x] Append tags
[x] Travis CI integration
[x] Ubuntu deb package generation on new tag
[x] Merge bookmark database files (for users who work on multiple systems)
[x] Export bookmarks in FF or Chrome html format.
[x] Option to add folder names as tags while importing HTML (thanks @mohammadKhalifa)
[x] Check and show upstream version
[ ] Anything else which would add value (please discuss in this thread)

mosegontar commented 7 years ago

I saw there was some discussion last year of integrating with Pinboard. Is this still a desired feature? (I personally would like it). If so, I can take it. Pinboard has a nice API with an endpoint for fetching all bookmarks and recent bookmarks (up to 100), so an import to Buku feature at least seems doable.

jarun commented 7 years ago

I have to think about it. Whether we really want to integrate or want to go solo. ;)

mosegontar commented 7 years ago

Gotcha, sounds good!

jarun commented 7 years ago

I was going through the Pinboard docs. They do support export/import to/from html. I think we can refrain from adding some service specific code to authenticate to pinboard, probably along with some auth token hanging around. Buku is a solution in itself, not an ad-hoc to any solution.

Now, coming to the integration part with popular web services, why don't we just document the fact that Buku supports bookmarks.html? That should be enough to have all your bookmarks in from anywhere (that too, incrementally).

Yes, it's a manual procedure. But it's safer/smarter than accessing any random web service's data directly.

mosegontar commented 7 years ago

Yeah that's reasonable, especially given that we can't integrate with every service. Sounds like import via bookmarks.html format should satisfy most services. Didn't know about the html format being the standard for bookmarks, interesting.

(I might write some separate small pinboard-to-buku script using https://user:password@api.pinboard.in/v1/posts/all?format=json endpoint to allow for a straight dump into buku)

Where are you thinking of documenting the support for bookmarks.html? I could add a line in the Import section of the Operational Notes wiki, just to be explicit about it. Should it go elsewhere too?

jarun commented 7 years ago

I might write some separate small pinboard-to-buku script

Of course!

I could add a line in the Import section of the Operational Notes wiki

You are in the right place!

jarun commented 7 years ago

Additionally, in the Introduction, please make the following modification:

"For GUI integration (or to sync bookmarks with your favourite bookmark management service), refer to the wiki page on System integration."

jarun commented 7 years ago

@alex-bender the colors support is alredy being worked on by @shv-q3 here. He hijacked this but he is almost done! ;) I changed the owner.

EDIT: Now I remember he attempted to do this once before with colord. The current approach has been our preferred approach.

jarun commented 7 years ago

@mosegontar I have added the note on web service integration.

mosegontar commented 7 years ago

Ah thanks; (funny timing, I was just about to do this). I've updated the man page (and I'll update the wiki), PR coming in a second.

jarun commented 7 years ago

Now I see what has happened. I had the README change locally, forgot to push it and went to sleep. :) Pushed it now.

jarun commented 7 years ago

@alex-bender @mosegontar how is it going? Planning to make a release early next month, probably in the first week. Can we expect to have the features assigned to you in?

mosegontar commented 7 years ago

@jarun sorry I've been a bit delayed. I'm moving this week and things have been hectic.

I've updated about half of buku.py docstrings into NumPy style. I just pushed the current work, https://github.com/mosegontar/Buku/commit/626e4d95a4f80a6b8425dea07bfe53dadb9ff550, so please take a look and let me know your thoughts. If things look okay, I can certainly finish the rest by early next month.

With regard to generating the documentation with Sphinx, I experimented a bit with it last week. I should be able to do it, but I do expect there will be some finicky aspects. Also wrt autogenerating the docs, that's something I still need to look into. So this part might not be finished in time for the release.

alex-bender commented 7 years ago

Hi @jarun! Im going back from vacation right now so cant say for sure for now. I'll ping you in a few days.

jarun commented 7 years ago

Thanks guys!

jarun commented 7 years ago

@mosegontar I can see it's going very well! :+1:

mosegontar commented 7 years ago

Just an update, docstrings are updated up to the Editor Mode Functions, so about 80% complete. Hoping to be done by tomorrow or Tuesday!

jarun commented 7 years ago

Simply awesome!

lkarbownik commented 7 years ago

Hi @jarun

First, amazing work. I love using Buku. Have you considered adding support for -t --stag flags when using --print and -f<num>? Currently using -t/--stag with these flags overrides the behaviour and runs app in interactive mode. My goal is to combine buku with other tools in a way that would allow to pre-query bookmarks by providing a tag, i.e.: buku -p -f3 -t $sometag | fzf ...

I know there is a --np to disable interactive mode but it does not seem to support other output formats and parsing multiline output is not ideal.

jarun commented 7 years ago

-f is the print format specifier. I believe what you want is search options should honor -f. @mosegontar would you like to try it out?

jarun commented 7 years ago

The trick would be to call print_rec() if -f is used, otherwise call print_single_rec() (as it is now). I would definitely love to call print_single_rec() [when -f is not used] because the overhead is much less. print_rec() queries the db again.

Another way would be to have an alternative API to print_single_rec() which does the filter check. In any case I would love to keep print_single_rec() as it is - straight logic without any condition check overhead.

mosegontar commented 7 years ago

Hi! @jarun yes, I'd definitely like to work on this; it's actually addressing a use case I've been thinking about myself. I'll take a closer look and let you know what questions come up.

lkarbownik commented 7 years ago

Thank you. It's so great to have a simple question met with such an enthusiasm. Let me know if there's anything I can do to help you with this one.

Also, not related but I think I might have just discovered a bug. For some reason when updating youtube bookmarks using -u all bookmark titles got replaced with a simple 'Youtube' title. When fetching the same page with curl or wget the title tag contains full title, but when requesting the page with Buku the tag contains only Youtube. Afaik, curl does not process any js, so I'm not sure yet what is causing this behaviour. I will continue investigating and I will try to find some more useful information.

Edit: It seems that the value of USER_AGENT is causing the problem. YouTube responds with "this video is not available" page instead of regular video page. Removing user-agent value from headers returns correct behaviour.

jarun commented 7 years ago

@lukaszkarbownik please check #211.

mosegontar commented 7 years ago

I played around with this last night and was basically able to get this working (search options honor
-f). There's a couple points I'd like to clarify and consider before going further:

Right now, --print can optionally take an argument of an integer or range of integers, indicating which records (by DB index) to print. If search options are included (e.g., -t or -s), it doesn't seem to make sense for the user to pass a DB index or range to --print, since we've already gathered search results based on, for example, some specific tag. Similarly, in the current behavior no argument to --print results in the output of every record with a DB index, which would also be undesirable.

So the question is what should an argument to --print correspond to when used with -t? The argument in this case could be used to narrow search_results even further, but this seems like it would require knowing the results in advance. Perhaps it's best to ignore the argument in this case and document that.
When using -f the DB index is printed along with whichever fields were selected. For example, buku -p -f1 -t python will give you a result like
```
...
120   http://pypi.python.org`
...
```
This makes sense and is useful when looking through the search results, but ultimately adds an additional field not selected by the user with -f1. If the goal is to pipe the output elsewhere, perhaps the inclusion of the DB index could be disabled with some flag? I'm primarily thinking of a situation in which the URL or title is passed to another program.

jarun commented 7 years ago

For the second question, you can add another filter which just shows the url, say -f0 (or some other number if 0 is default). Do NOT change the current filters because people use them in their scripts already.

jarun commented 7 years ago

For the first question, print_single_rec() takes a record/row. You should use that (call in loop for all search result records) or add a similar function that honours filtering. I would prefer not to change print_rec or print_single_rec. Let's have print_single_rec_with_filter.

Now, we should not be checking every time before printing a single record whether filter is enabled or not. Just check it once and use something like a function pointer in C to use the appropriate API is filtering is on.

This generic issue listing is not really for in-depth discussions on specific requests. Can you please open a defect or something where we can discuss these?

rachmadaniHaryono commented 7 years ago

just recently read this

Update user agent string in buku.py, if applicable

is it possible to use this instead https://github.com/hellysmile/fake-useragent?

jarun commented 7 years ago

Several points:

it's preferable to go with a static value than grabbing one from the web (per session at that)
we need one specifically for a text based browser
while the project by itself is interesting, it's risky to depend on non-standard libraries
personally I am very much resistant to adding new library deps ;)

jarun commented 7 years ago

@rachmadaniHaryono we have a user agent string of own at https://github.com/jarun/Buku/commit/848d9d79431be6ec7a668995ed282514374b9cab!

rachmadaniHaryono commented 7 years ago

after reading the recent comment #219, is it possible and more advantageous to fetch the title asynchronously to speed up the process?

jarun commented 7 years ago

Please explain in more detail. I didn't quite understand. Currently it is threaded. Which approach are you proposing?

rachmadaniHaryono commented 7 years ago

I'm thinking something with aiohttp to fetch page.

Reference https://stackoverflow.com/questions/8546273/is-non-blocking-i-o-really-faster-than-multi-threaded-blocking-i-o-how

But looking at the code, not sure how feasible this is and if this is maintainable

It also only used when importing a lot of bookmark which is not essential. So I'm still not sure about this

jarun commented 7 years ago

short answer: it's more about the overhead of having a thread per connection. non-blocking io lets one avoid having a thread per connection.

The thread is more of a propaganda than having any real meat. Here's my take:

We are in the age of i7s and SSDs. The overhead of threading for fetching webpages is insignificant.
We have minimal data processing overhead and NO graphical overhead. We get the data and we are done.
Even if we use asynchronous IO the program waits for the slowest thread (as defined by the timeout [currently 40 secs?]). We can't exit.
Full refresh is a once-in-a-while operation.

hubitor commented 7 years ago

I had several URLs in my notepad and I couldn't find a way to import them directly into buku. The add option accepts only one URL. So first I add them with an add-on in firefox, then export all the bookmarks and finally import them into buku. The problem was that it took some time because not only the new bookmarks were imported but also some thousands bookmarks which were already in firefox.

So, what about importing from a text file which contains URLs separated by new lines or spaces or semicolons? Or even batch import from the console?

jarun commented 7 years ago

Or even batch import from the console?

It would be a very small script in ANY scripting language (shell/python/perl...). Please write it yourself.

jarun commented 7 years ago

@mosegontar @rachmadaniHaryono can you guys audit the add bookmark path and see if there's any way to optimize the performance anywhere? It can be reduction of condition checks, loops, less variables, anything... It's not that we are slow, but it's always great to be audited (other than IRS).

mosegontar commented 7 years ago

Sure 👍 i'll take a look

questor commented 7 years ago

two suggestions:

html-export with template (already done, I can make a pull request if you want; helpful to create a static website with filtering possibilities in JS)
it would be cool to have some sort of plugins to update the database, for example if you have a github-repo bookmarked to automatically create tags based on the license and language. something to make it easier to grab more information to a url based on hints on the webpage, but with a plugin concept to be able to use different "services". (I can help with coding for this, but I dunno how to do a plugin-concept in python. any comments?

[edit] found python plugin concept, will make an implementation as proposal

jarun commented 7 years ago

html-export with template (already done, I can make a pull request if you want; helpful to create a static website with filtering possibilities in JS)

This is definitely a new plugin/project. Please create one and I'll add a mention to it.

jarun commented 7 years ago

it would be cool to have some sort of plugins to update the database, for example if you have a github-repo bookmarked to automatically create tags based on the license and language. something to make it easier to grab more information to a url based on hints on the webpage, but with a plugin concept to be able to use different "services". (I can help with coding for this, but I dunno how to do a plugin-concept in python. any comments?

Are you creating a plugin framework or a plugin in this case? The framework would be a part of Buku, the plugin will be a separate project.

rachmadaniHaryono commented 7 years ago

jarun / buku

ToDo List #174

Notes

Identified tasks