Closed jarun closed 6 years ago
I saw there was some discussion last year of integrating with Pinboard. Is this still a desired feature? (I personally would like it). If so, I can take it. Pinboard has a nice API with an endpoint for fetching all bookmarks and recent bookmarks (up to 100), so an import to Buku feature at least seems doable.
I have to think about it. Whether we really want to integrate or want to go solo. ;)
Gotcha, sounds good!
I was going through the Pinboard docs. They do support export/import to/from html. I think we can refrain from adding some service specific code to authenticate to pinboard, probably along with some auth token hanging around. Buku is a solution in itself, not an ad-hoc to any solution.
Now, coming to the integration part with popular web services, why don't we just document the fact that Buku supports bookmarks.html? That should be enough to have all your bookmarks in from anywhere (that too, incrementally).
Yes, it's a manual procedure. But it's safer/smarter than accessing any random web service's data directly.
Yeah that's reasonable, especially given that we can't integrate with every service. Sounds like import via bookmarks.html format should satisfy most services. Didn't know about the html format being the standard for bookmarks, interesting.
(I might write some separate small pinboard-to-buku script using https://user:password@api.pinboard.in/v1/posts/all?format=json endpoint to allow for a straight dump into buku)
Where are you thinking of documenting the support for bookmarks.html? I could add a line in the Import section of the Operational Notes wiki, just to be explicit about it. Should it go elsewhere too?
I might write some separate small pinboard-to-buku script
Of course!
I could add a line in the Import section of the Operational Notes wiki
You are in the right place!
Additionally, in the Introduction, please make the following modification:
"For GUI integration (or to sync bookmarks with your favourite bookmark management service), refer to the wiki page on System integration."
@alex-bender the colors support is alredy being worked on by @shv-q3 here. He hijacked this but he is almost done! ;) I changed the owner.
EDIT: Now I remember he attempted to do this once before with colord. The current approach has been our preferred approach.
@mosegontar I have added the note on web service integration.
Ah thanks; (funny timing, I was just about to do this). I've updated the man page (and I'll update the wiki), PR coming in a second.
Now I see what has happened. I had the README change locally, forgot to push it and went to sleep. :) Pushed it now.
@alex-bender @mosegontar how is it going? Planning to make a release early next month, probably in the first week. Can we expect to have the features assigned to you in?
@jarun sorry I've been a bit delayed. I'm moving this week and things have been hectic.
I've updated about half of buku.py
docstrings into NumPy style. I just pushed the current work, https://github.com/mosegontar/Buku/commit/626e4d95a4f80a6b8425dea07bfe53dadb9ff550, so please take a look and let me know your thoughts. If things look okay, I can certainly finish the rest by early next month.
With regard to generating the documentation with Sphinx, I experimented a bit with it last week. I should be able to do it, but I do expect there will be some finicky aspects. Also wrt autogenerating the docs, that's something I still need to look into. So this part might not be finished in time for the release.
Hi @jarun! Im going back from vacation right now so cant say for sure for now. I'll ping you in a few days.
Thanks guys!
@mosegontar I can see it's going very well! :+1:
Just an update, docstrings are updated up to the Editor Mode Functions, so about 80% complete. Hoping to be done by tomorrow or Tuesday!
Simply awesome!
Hi @jarun
First, amazing work. I love using Buku.
Have you considered adding support for -t --stag
flags when using --print
and -f<num>
?
Currently using -t/--stag
with these flags overrides the behaviour and runs app in interactive mode.
My goal is to combine buku with other tools in a way that would allow to pre-query bookmarks by providing a tag, i.e.:
buku -p -f3 -t $sometag | fzf ...
I know there is a --np
to disable interactive mode but it does not seem to support other output formats and parsing multiline output is not ideal.
-f
is the print format specifier. I believe what you want is search options should honor -f
. @mosegontar would you like to try it out?
The trick would be to call print_rec()
if -f
is used, otherwise call print_single_rec()
(as it is now). I would definitely love to call print_single_rec()
[when -f
is not used] because the overhead is much less. print_rec()
queries the db again.
Another way would be to have an alternative API to print_single_rec()
which does the filter check. In any case I would love to keep print_single_rec()
as it is - straight logic without any condition check overhead.
Hi! @jarun yes, I'd definitely like to work on this; it's actually addressing a use case I've been thinking about myself. I'll take a closer look and let you know what questions come up.
Thank you. It's so great to have a simple question met with such an enthusiasm. Let me know if there's anything I can do to help you with this one.
Also, not related but I think I might have just discovered a bug.
For some reason when updating youtube bookmarks using -u
all bookmark titles got replaced with a simple 'Youtube' title.
When fetching the same page with curl or wget the title tag contains full title, but when requesting the page with Buku the tag contains only Youtube
.
Afaik, curl does not process any js, so I'm not sure yet what is causing this behaviour.
I will continue investigating and I will try to find some more useful information.
Edit: It seems that the value of USER_AGENT is causing the problem. YouTube responds with "this video is not available" page instead of regular video page. Removing user-agent value from headers returns correct behaviour.
@lukaszkarbownik please check #211.
I played around with this last night and was basically able to get this working (search options honor
-f
). There's a couple points I'd like to clarify and consider before going further:
Right now, --print
can optionally take an argument of an integer or range of integers, indicating which records (by DB index) to print. If search options are included (e.g., -t
or -s
), it doesn't seem to make sense for the user to pass a DB index or range to --print
, since we've already gathered search results based on, for example, some specific tag. Similarly, in the current behavior no argument to --print
results in the output of every record with a DB index, which would also be undesirable.
So the question is what should an argument to --print
correspond to when used with -t
? The argument in this case could be used to narrow search_results even further, but this seems like it would require knowing the results in advance. Perhaps it's best to ignore the argument in this case and document that.
When using -f
the DB index is printed along with whichever fields were selected. For example, buku -p -f1 -t python
will give you a result like
...
120 http://pypi.python.org`
...
This makes sense and is useful when looking through the search results, but ultimately adds an additional field not selected by the user with -f1
. If the goal is to pipe the output elsewhere, perhaps the inclusion of the DB index could be disabled with some flag? I'm primarily thinking of a situation in which the URL or title is passed to another program.
For the second question, you can add another filter which just shows the url, say -f0
(or some other number if 0 is default). Do NOT change the current filters because people use them in their scripts already.
For the first question, print_single_rec()
takes a record/row. You should use that (call in loop for all search result records) or add a similar function that honours filtering. I would prefer not to change print_rec or print_single_rec. Let's have print_single_rec_with_filter
.
Now, we should not be checking every time before printing a single record whether filter is enabled or not. Just check it once and use something like a function pointer in C to use the appropriate API is filtering is on.
This generic issue listing is not really for in-depth discussions on specific requests. Can you please open a defect or something where we can discuss these?
just recently read this
Update user agent string in buku.py, if applicable
is it possible to use this instead https://github.com/hellysmile/fake-useragent?
Several points:
@rachmadaniHaryono we have a user agent string of own at https://github.com/jarun/Buku/commit/848d9d79431be6ec7a668995ed282514374b9cab!
after reading the recent comment #219, is it possible and more advantageous to fetch the title asynchronously to speed up the process?
Please explain in more detail. I didn't quite understand. Currently it is threaded. Which approach are you proposing?
I'm thinking something with aiohttp to fetch page.
But looking at the code, not sure how feasible this is and if this is maintainable
It also only used when importing a lot of bookmark which is not essential. So I'm still not sure about this
short answer: it's more about the overhead of having a thread per connection. non-blocking io lets one avoid having a thread per connection.
The thread is more of a propaganda than having any real meat. Here's my take:
I had several URLs in my notepad and I couldn't find a way to import them directly into buku. The add option accepts only one URL. So first I add them with an add-on in firefox, then export all the bookmarks and finally import them into buku. The problem was that it took some time because not only the new bookmarks were imported but also some thousands bookmarks which were already in firefox.
So, what about importing from a text file which contains URLs separated by new lines or spaces or semicolons? Or even batch import from the console?
Or even batch import from the console?
It would be a very small script in ANY scripting language (shell/python/perl...). Please write it yourself.
@mosegontar @rachmadaniHaryono can you guys audit the add bookmark path and see if there's any way to optimize the performance anywhere? It can be reduction of condition checks, loops, less variables, anything... It's not that we are slow, but it's always great to be audited (other than IRS).
Sure 👍 i'll take a look
two suggestions:
[edit] found python plugin concept, will make an implementation as proposal
html-export with template (already done, I can make a pull request if you want; helpful to create a static website with filtering possibilities in JS)
This is definitely a new plugin/project. Please create one and I'll add a mention to it.
it would be cool to have some sort of plugins to update the database, for example if you have a github-repo bookmarked to automatically create tags based on the license and language. something to make it easier to grab more information to a url based on hints on the webpage, but with a plugin concept to be able to use different "services". (I can help with coding for this, but I dunno how to do a plugin-concept in python. any comments?
Are you creating a plugin framework or a plugin in this case? The framework would be a part of Buku, the plugin will be a separate project.
See also http://yapsy.sourceforge.net
I'm thinking the plugin workflow should be like this
Buku just have to provide a way to enable/disable plugin.
See project beet with their plugin system
I think it would be greatly useful. Please collaborate and go ahead. Please add an issue and a branch (plugin-fw
) to work on this.
@questor I am adding you as a collab. Please accept and work on the branch for plugin framework.
I have created the branch plugin-fw
.
Guys,
I'll be away for a few days till next Sun and won't be active other than responding to mails (family time ;)). For the plugin fw branch please do peer code review, merge and work together.
okay, to give you some more insights what I want to achieve and what workflow I have in mind some explanations:
I have already started with a very simple plugin system, but I need some help with the decisions where to store additional information from or in the plugin (external table or in the current table). But I agree, the plugin-framework has to be part of buku, but the plugins are separate projects.
btw: the template changes are really simple and does not interfere with the other options. the changes can be seen here: https://github.com/questor/Buku/commit/eb4b8c745f793eb631842e290a650d3a8259e6ff
I will need some time to think about it. At the same time, the plugins should be as detached as possible.
I think a plugin should have its own database for its own data. The foreign key should be the URL which is unique in Buku DB.
Also, for any changes to Buku database the plugin should use Buku APIs or request.
Let's say the user updates description through plugin. It should be in the desc field of Buku. For fields which are available both Buku and plugins work on the same field. That is, no data duplication.
@rachmadaniHaryono are there any test cases for shorten and expand? I believe for well-known services like google the shortened url is always the same. Can you please add the test case if not in place already? This we ensure we know if the tny.im service is active.
Continued from #135.
Notes
The list below is a growing one. While suggesting new features please consider contributing to
Buku
. The code is intentionally kept simple and easy to understand with comments. We'll be happy to assist any new contributor. We need your help!Some of the fresh-baked features may not have been released yet. Grab the master branch for those.
Identified tasks
profiles.ini
(see #212, thanks @alex-bender)--format
in search results (ref, thanks @mosegontar)importdb()
]