Closed nodiscc closed 6 years ago
Hi!
Here are some first thoughts :)
How much code separation from the main client? How to properly implement it?
Let's start simple:
IMO these operations should be performed separately:
On the long run, we'll see whether more granularity is needed to keep sources and CLI usage consistent.
Add extractor configuration there [in a config file]?
Archival preferences could be specified in a config file:
There will inevitably be some feature creep, as there are many use cases for web scraping and web content download in general
As for the current REST client, 3rd-party integrations should be implemented in a library form, with a console entrypoint that may serve as a Minimal Working Example in case someone wants to customize data retrieval and/or processing.
multimedia/page content archiving/mirroring could be added directly as a Shaarli plugin [...] I don't want my webserver/PHP stack to exec() call youtube-dl, I have a shared host without youtube-dl/wget/... support...)
The archival tool could be wrapped in a web (micro)service providing a REST API, that would be called by the corresponding Shaarli plugin.
I've been thinking about this lately. Can't figure out how to add a subcommand parser that would run a function that does 1. get-link
with the specified parameters 2. write the output to a file (JSON) 3. parse the file and run archival methods on the link list. The command line would be something like
shaarli archive-links --limit=200 --tags=something --outdir=archive/
.
I can't simply add archive-links
to endpoints since those specifically correspond to Shaarli API endpoints
All in all I'm thinking about starting a separate project that would depend on python-shaarli-client
, but maybe you could point me to the right way of adding that subcommand parser?
Suggestions:
shaarli-api
and add new scripts, e.g. shaarli-archive
api
subparser, and declare other subparsers for specific actions:
$ shaarli api <params>
$ shaarli archive <params>
$ shaarli <action> <params>
Option 2. seems more consistent, by providing a single entrypoint and action-specific subparsers, while keeping a single project/package to gather Shaarli archival tools.
@nodiscc there's also the possibility of providing an interactive CLI entrypoint using the click library (possibly overkill but potentially quite fun to write :) )
Hi, I wrote a small patch to implement an --outfile
command line parameter, it got me up to speed and I have a clearer picture of how to implement basic shaarli api/shaarli archive...
command line logic now (and thanks for your comment, that put me on the right track).
I'll make the final tests (python SSL warnings also lead me to finally ditch my server self-signed certs and setup Letsencrypt) and send a PR soon. It took me a while to pass the CI tests :)
Edit: re interactive interface: I'm more interested in the scripted/automated aspect of this tool right now, but I always wanted to look into python-click. Maybe someday :)
Hi, this is not intended to be merged.
I attached my current quick & dirty script to archive music from an export of my Shaarli instance. It's just a bash script, as I needed it quick. Currently it downloads music, which is what I needed. I'd like to rewrite it in Python, with well thought-out integration with the official client. Consider this as a proof of concept for a rewrite of https://github.com/nodiscc/shaarchiver
I'd like some input on how this would be best achieved:
entry_point
to setuptools?shaarli
?actions =
option in config file? Add extractor configuration there?Some notes:
exec()
call youtube-dl, I have a shared host without youtube-dl/wget/... support...)--format text
is broken for me (invalid option --format). I'll investigate that.To get a clearer picture, I added a list of current shaarchiver features, as well as features that might reasonably be requested, to the script header. Have a look
With that mind, what is the best way to start implementing an archiving tool around the API? (@virtualtam this is for you :) I'd rather not add bloat to the shiny new API client - I think it should stay a clean, reference client. On the other hand well integrated actions/modules would be interesting)
Once I have a clearer picture I will start working on a basic implementation, and might as well ping people who were interested in a Shaarli archiving tool.
Again there is no rush :) ETA year 2018. I'd like to work on polishing the API client first, add some tests, etc.
Edits: