shaarli / python-shaarli-client

Python3 CLI to interact with a Shaarli instance
https://python-shaarli-client.readthedocs.io/
MIT License
44 stars 10 forks source link

Request for comments: add media/page archiving capabilities to the Python Shaarli client #22

Closed nodiscc closed 6 years ago

nodiscc commented 7 years ago

Hi, this is not intended to be merged.

I attached my current quick & dirty script to archive music from an export of my Shaarli instance. It's just a bash script, as I needed it quick. Currently it downloads music, which is what I needed. I'd like to rewrite it in Python, with well thought-out integration with the official client. Consider this as a proof of concept for a rewrite of https://github.com/nodiscc/shaarchiver

I'd like some input on how this would be best achieved:

Some notes:

To get a clearer picture, I added a list of current shaarchiver features, as well as features that might reasonably be requested, to the script header. Have a look

With that mind, what is the best way to start implementing an archiving tool around the API? (@virtualtam this is for you :) I'd rather not add bloat to the shiny new API client - I think it should stay a clean, reference client. On the other hand well integrated actions/modules would be interesting)

Once I have a clearer picture I will start working on a basic implementation, and might as well ping people who were interested in a Shaarli archiving tool.

Again there is no rush :) ETA year 2018. I'd like to work on polishing the API client first, add some tests, etc.

Edits:

virtualtam commented 7 years ago

Hi!

Here are some first thoughts :)

How much code separation from the main client? How to properly implement it?

Let's start simple:

IMO these operations should be performed separately:

On the long run, we'll see whether more granularity is needed to keep sources and CLI usage consistent.

Add extractor configuration there [in a config file]?

Archival preferences could be specified in a config file:

There will inevitably be some feature creep, as there are many use cases for web scraping and web content download in general

As for the current REST client, 3rd-party integrations should be implemented in a library form, with a console entrypoint that may serve as a Minimal Working Example in case someone wants to customize data retrieval and/or processing.

multimedia/page content archiving/mirroring could be added directly as a Shaarli plugin [...] I don't want my webserver/PHP stack to exec() call youtube-dl, I have a shared host without youtube-dl/wget/... support...)

The archival tool could be wrapped in a web (micro)service providing a REST API, that would be called by the corresponding Shaarli plugin.

nodiscc commented 6 years ago

I've been thinking about this lately. Can't figure out how to add a subcommand parser that would run a function that does 1. get-link with the specified parameters 2. write the output to a file (JSON) 3. parse the file and run archival methods on the link list. The command line would be something like

shaarli archive-links --limit=200 --tags=something --outdir=archive/.

I can't simply add archive-links to endpoints since those specifically correspond to Shaarli API endpoints

All in all I'm thinking about starting a separate project that would depend on python-shaarli-client, but maybe you could point me to the right way of adding that subcommand parser?

virtualtam commented 6 years ago

Suggestions:

  1. rename the current script to shaarli-api and add new scripts, e.g. shaarli-archive
  2. move API commands to an api subparser, and declare other subparsers for specific actions:
    • $ shaarli api <params>
    • $ shaarli archive <params>
    • $ shaarli <action> <params>

Option 2. seems more consistent, by providing a single entrypoint and action-specific subparsers, while keeping a single project/package to gather Shaarli archival tools.

virtualtam commented 6 years ago

@nodiscc there's also the possibility of providing an interactive CLI entrypoint using the click library (possibly overkill but potentially quite fun to write :) )

nodiscc commented 6 years ago

Hi, I wrote a small patch to implement an --outfile command line parameter, it got me up to speed and I have a clearer picture of how to implement basic shaarli api/shaarli archive... command line logic now (and thanks for your comment, that put me on the right track).

I'll make the final tests (python SSL warnings also lead me to finally ditch my server self-signed certs and setup Letsencrypt) and send a PR soon. It took me a while to pass the CI tests :)

Edit: re interactive interface: I'm more interested in the scripted/automated aspect of this tool right now, but I always wanted to look into python-click. Maybe someday :)

nodiscc commented 6 years ago

Moved to https://github.com/shaarli/python-shaarli-client/issues/24