jarun / buku

:bookmark: Personal mini-web in text
GNU General Public License v3.0
6.51k stars 294 forks source link

ToDo List #174

Closed jarun closed 6 years ago

jarun commented 7 years ago

Continued from #135.

Notes

The list below is a growing one. While suggesting new features please consider contributing to Buku. The code is intentionally kept simple and easy to understand with comments. We'll be happy to assist any new contributor. We need your help!

Some of the fresh-baked features may not have been released yet. Grab the master branch for those.

Identified tasks

mosegontar commented 7 years ago

I saw there was some discussion last year of integrating with Pinboard. Is this still a desired feature? (I personally would like it). If so, I can take it. Pinboard has a nice API with an endpoint for fetching all bookmarks and recent bookmarks (up to 100), so an import to Buku feature at least seems doable.

jarun commented 7 years ago

I have to think about it. Whether we really want to integrate or want to go solo. ;)

mosegontar commented 7 years ago

Gotcha, sounds good!

jarun commented 7 years ago

I was going through the Pinboard docs. They do support export/import to/from html. I think we can refrain from adding some service specific code to authenticate to pinboard, probably along with some auth token hanging around. Buku is a solution in itself, not an ad-hoc to any solution.

Now, coming to the integration part with popular web services, why don't we just document the fact that Buku supports bookmarks.html? That should be enough to have all your bookmarks in from anywhere (that too, incrementally).

Yes, it's a manual procedure. But it's safer/smarter than accessing any random web service's data directly.

mosegontar commented 7 years ago

Yeah that's reasonable, especially given that we can't integrate with every service. Sounds like import via bookmarks.html format should satisfy most services. Didn't know about the html format being the standard for bookmarks, interesting.

(I might write some separate small pinboard-to-buku script using https://user:password@api.pinboard.in/v1/posts/all?format=json endpoint to allow for a straight dump into buku)

Where are you thinking of documenting the support for bookmarks.html? I could add a line in the Import section of the Operational Notes wiki, just to be explicit about it. Should it go elsewhere too?

jarun commented 7 years ago

I might write some separate small pinboard-to-buku script

Of course!

I could add a line in the Import section of the Operational Notes wiki

You are in the right place!

jarun commented 7 years ago

Additionally, in the Introduction, please make the following modification:

"For GUI integration (or to sync bookmarks with your favourite bookmark management service), refer to the wiki page on System integration."

jarun commented 7 years ago

@alex-bender the colors support is alredy being worked on by @shv-q3 here. He hijacked this but he is almost done! ;) I changed the owner.

EDIT: Now I remember he attempted to do this once before with colord. The current approach has been our preferred approach.

jarun commented 7 years ago

@mosegontar I have added the note on web service integration.

mosegontar commented 7 years ago

Ah thanks; (funny timing, I was just about to do this). I've updated the man page (and I'll update the wiki), PR coming in a second.

jarun commented 7 years ago

Now I see what has happened. I had the README change locally, forgot to push it and went to sleep. :) Pushed it now.

jarun commented 7 years ago

@alex-bender @mosegontar how is it going? Planning to make a release early next month, probably in the first week. Can we expect to have the features assigned to you in?

mosegontar commented 7 years ago

@jarun sorry I've been a bit delayed. I'm moving this week and things have been hectic.

I've updated about half of buku.py docstrings into NumPy style. I just pushed the current work, https://github.com/mosegontar/Buku/commit/626e4d95a4f80a6b8425dea07bfe53dadb9ff550, so please take a look and let me know your thoughts. If things look okay, I can certainly finish the rest by early next month.

With regard to generating the documentation with Sphinx, I experimented a bit with it last week. I should be able to do it, but I do expect there will be some finicky aspects. Also wrt autogenerating the docs, that's something I still need to look into. So this part might not be finished in time for the release.

alex-bender commented 7 years ago

Hi @jarun! Im going back from vacation right now so cant say for sure for now. I'll ping you in a few days.

jarun commented 7 years ago

Thanks guys!

jarun commented 7 years ago

@mosegontar I can see it's going very well! :+1:

mosegontar commented 7 years ago

Just an update, docstrings are updated up to the Editor Mode Functions, so about 80% complete. Hoping to be done by tomorrow or Tuesday!

jarun commented 7 years ago

Simply awesome!

lkarbownik commented 7 years ago

Hi @jarun

First, amazing work. I love using Buku. Have you considered adding support for -t --stag flags when using --print and -f<num>? Currently using -t/--stag with these flags overrides the behaviour and runs app in interactive mode. My goal is to combine buku with other tools in a way that would allow to pre-query bookmarks by providing a tag, i.e.: buku -p -f3 -t $sometag | fzf ...

I know there is a --np to disable interactive mode but it does not seem to support other output formats and parsing multiline output is not ideal.

jarun commented 7 years ago

-f is the print format specifier. I believe what you want is search options should honor -f. @mosegontar would you like to try it out?

jarun commented 7 years ago

The trick would be to call print_rec() if -f is used, otherwise call print_single_rec() (as it is now). I would definitely love to call print_single_rec() [when -f is not used] because the overhead is much less. print_rec() queries the db again.

Another way would be to have an alternative API to print_single_rec() which does the filter check. In any case I would love to keep print_single_rec() as it is - straight logic without any condition check overhead.

mosegontar commented 7 years ago

Hi! @jarun yes, I'd definitely like to work on this; it's actually addressing a use case I've been thinking about myself. I'll take a closer look and let you know what questions come up.

lkarbownik commented 7 years ago

Thank you. It's so great to have a simple question met with such an enthusiasm. Let me know if there's anything I can do to help you with this one.

Also, not related but I think I might have just discovered a bug. For some reason when updating youtube bookmarks using -u all bookmark titles got replaced with a simple 'Youtube' title. When fetching the same page with curl or wget the title tag contains full title, but when requesting the page with Buku the tag contains only Youtube. Afaik, curl does not process any js, so I'm not sure yet what is causing this behaviour. I will continue investigating and I will try to find some more useful information.

Edit: It seems that the value of USER_AGENT is causing the problem. YouTube responds with "this video is not available" page instead of regular video page. Removing user-agent value from headers returns correct behaviour.

jarun commented 7 years ago

@lukaszkarbownik please check #211.

mosegontar commented 7 years ago

I played around with this last night and was basically able to get this working (search options honor
-f). There's a couple points I'd like to clarify and consider before going further:

jarun commented 7 years ago

For the second question, you can add another filter which just shows the url, say -f0 (or some other number if 0 is default). Do NOT change the current filters because people use them in their scripts already.

jarun commented 7 years ago

For the first question, print_single_rec() takes a record/row. You should use that (call in loop for all search result records) or add a similar function that honours filtering. I would prefer not to change print_rec or print_single_rec. Let's have print_single_rec_with_filter.

Now, we should not be checking every time before printing a single record whether filter is enabled or not. Just check it once and use something like a function pointer in C to use the appropriate API is filtering is on.

This generic issue listing is not really for in-depth discussions on specific requests. Can you please open a defect or something where we can discuss these?

rachmadaniHaryono commented 7 years ago

just recently read this

Update user agent string in buku.py, if applicable

is it possible to use this instead https://github.com/hellysmile/fake-useragent?

jarun commented 7 years ago

Several points:

jarun commented 7 years ago

@rachmadaniHaryono we have a user agent string of own at https://github.com/jarun/Buku/commit/848d9d79431be6ec7a668995ed282514374b9cab!

rachmadaniHaryono commented 7 years ago

after reading the recent comment #219, is it possible and more advantageous to fetch the title asynchronously to speed up the process?

jarun commented 7 years ago

Please explain in more detail. I didn't quite understand. Currently it is threaded. Which approach are you proposing?

rachmadaniHaryono commented 7 years ago

I'm thinking something with aiohttp to fetch page.

Reference https://stackoverflow.com/questions/8546273/is-non-blocking-i-o-really-faster-than-multi-threaded-blocking-i-o-how

But looking at the code, not sure how feasible this is and if this is maintainable

It also only used when importing a lot of bookmark which is not essential. So I'm still not sure about this

jarun commented 7 years ago

short answer: it's more about the overhead of having a thread per connection. non-blocking io lets one avoid having a thread per connection.

The thread is more of a propaganda than having any real meat. Here's my take:

hubitor commented 7 years ago

I had several URLs in my notepad and I couldn't find a way to import them directly into buku. The add option accepts only one URL. So first I add them with an add-on in firefox, then export all the bookmarks and finally import them into buku. The problem was that it took some time because not only the new bookmarks were imported but also some thousands bookmarks which were already in firefox.

So, what about importing from a text file which contains URLs separated by new lines or spaces or semicolons? Or even batch import from the console?

jarun commented 7 years ago

Or even batch import from the console?

It would be a very small script in ANY scripting language (shell/python/perl...). Please write it yourself.

jarun commented 7 years ago

@mosegontar @rachmadaniHaryono can you guys audit the add bookmark path and see if there's any way to optimize the performance anywhere? It can be reduction of condition checks, loops, less variables, anything... It's not that we are slow, but it's always great to be audited (other than IRS).

mosegontar commented 7 years ago

Sure 👍 i'll take a look

questor commented 7 years ago

two suggestions:

[edit] found python plugin concept, will make an implementation as proposal

jarun commented 7 years ago

html-export with template (already done, I can make a pull request if you want; helpful to create a static website with filtering possibilities in JS)

This is definitely a new plugin/project. Please create one and I'll add a mention to it.

jarun commented 7 years ago

it would be cool to have some sort of plugins to update the database, for example if you have a github-repo bookmarked to automatically create tags based on the license and language. something to make it easier to grab more information to a url based on hints on the webpage, but with a plugin concept to be able to use different "services". (I can help with coding for this, but I dunno how to do a plugin-concept in python. any comments?

Are you creating a plugin framework or a plugin in this case? The framework would be a part of Buku, the plugin will be a separate project.

rachmadaniHaryono commented 7 years ago

See also http://yapsy.sourceforge.net

I'm thinking the plugin workflow should be like this

Buku just have to provide a way to enable/disable plugin.

See project beet with their plugin system

jarun commented 7 years ago

I think it would be greatly useful. Please collaborate and go ahead. Please add an issue and a branch (plugin-fw) to work on this.

@questor I am adding you as a collab. Please accept and work on the branch for plugin framework.

jarun commented 7 years ago

I have created the branch plugin-fw.

Guys,

I'll be away for a few days till next Sun and won't be active other than responding to mails (family time ;)). For the plugin fw branch please do peer code review, merge and work together.

questor commented 7 years ago

okay, to give you some more insights what I want to achieve and what workflow I have in mind some explanations:

I have already started with a very simple plugin system, but I need some help with the decisions where to store additional information from or in the plugin (external table or in the current table). But I agree, the plugin-framework has to be part of buku, but the plugins are separate projects.

questor commented 7 years ago

btw: the template changes are really simple and does not interfere with the other options. the changes can be seen here: https://github.com/questor/Buku/commit/eb4b8c745f793eb631842e290a650d3a8259e6ff

jarun commented 7 years ago

I will need some time to think about it. At the same time, the plugins should be as detached as possible.

jarun commented 7 years ago

I think a plugin should have its own database for its own data. The foreign key should be the URL which is unique in Buku DB.

Also, for any changes to Buku database the plugin should use Buku APIs or request.

jarun commented 7 years ago

Let's say the user updates description through plugin. It should be in the desc field of Buku. For fields which are available both Buku and plugins work on the same field. That is, no data duplication.

jarun commented 7 years ago

@rachmadaniHaryono are there any test cases for shorten and expand? I believe for well-known services like google the shortened url is always the same. Can you please add the test case if not in place already? This we ensure we know if the tny.im service is active.