dstask-import: import GitHub issues (and in the future more) to dstask

Dieterbe commented 3 years ago

Hello, this is a prototype tool for a github->dstask synchronization (in the future, potentially more). see #77 for some background.

It works like so: in a file $HOME/.tasksync.toml you create one or more github sections, like this:

[[github]]
token = "xxxxxx"
user = "grafana"
repo = "carbon-relay-ng"   
get_closed = true             
assignee = "Dieterbe"       

[[github]]
token = "xxx"
user = "grafana"
repo = "metrictank"
get_closed = false
assignee = ""

each section represents a user/repo that will be queried for issues, per the assignee and get_closed conditions. You need to get a token from https://github.com/settings/tokens. The issues get saved as tasks. closed on github will be status resolved, and open on github will become status pending in dstask (if the task is open/not resolved, we should probably just honor the pre-existing status in dstask instead as it would be more accurate than just "pending") UUID's are not randomly generated, but rather deterministically based on GH user, repo and issue number. the fetched tasks are then merged with any pre-existing tasks, should they exist. we respect the local notes, but take all other info from GH. (this will certainly be revised. I can see how one would probably want to have local concepts of projects, dependencies, etc)

Shortcomings:

only works for issues, not PR's
does not automatically trigger any git operations - leaves the local git repo in a "dirty" state for now.

Curious for any feedback! I hope to in the future expand this to sync with more different things (e.g. todoist, which would be bidirectional, and potentially email)

Also, not sure if this should go into this repository, i can also create a new repository on my account for this tool. However I do have the impression that as I develop this, I will simultaneously want to improve some of the existing code. That process would certainly be easier if this tool would live inside the same repository. (but it could be marked as "experimental" or whatever)

Here's how it looks like after i synced 2 github repositories: tasksync_000

Dieterbe commented 3 years ago

PS: an alternative approach to this would be to use https://github.com/ralphbean/bugwarrior, and then import taskwarrior data into dstask. but getting it up and running looked non trivial, and i wasn't confident that it would be a clean integration.

naggie commented 3 years ago

I'll review more in depth when I find time, this deserves to be looked at in more detail. Thanks for the effort, I've been wanting to do something like this.

So this seems unidirectional -- I think that's good, it makes the code simpler. Might we therefore consider this the same importing issues from a different system a bit like what we can currently do with a taskwarrior database?

Regarding deterministic UUIDs ... seems necessary from a sync perspective if a user were to import from github on 2 different machines at once. I had thought about adding a third party ID section, but that would not be robust.

Dieterbe commented 3 years ago

Might we therefore consider this the same importing issues from a different system a bit like what we can currently do with a taskwarrior database?

not sure, i'm not very familiar with TW, nor the details of dstask's import of it. One clear difference though, is that my tool generates the yaml files (and in the future, will possibly do the necessary git manipulations as well), not feed data into dstask itself via stdin. although this seems like an implementation detail. I think we mainly need to figure out how the import should work conceptually[*] and work out how close to dstask itself this tool should live (in the same binary? same repo but different binary? different repo but under a unified "Dstask" github org? or move it to my personal github org?), and then this implementation detail will be clearer.

[*] e.g. how to unify properties coming from GH that may have been locally edited. do we somehow import gh dependencies (GH issues referring one another) or should this merely be a local tasks concept. How about projets? local ones? lots of open questions here, i also have a few ideas. like expressing in the config mappings to local projects. It is quite possible that bidirectional sync would make sense (e.g. on unpausing or finshing local tasks would update the GH issue), i just don't have a strong need for this, so i left it out for now.

naggie commented 3 years ago

, is that my tool generates the yaml files

I see, does it use the dstask validators and structs at least?

I think we mainly need to figure out how the import should work conceptually[*] work out how close to dstask itself this tool should live

Agreed. Currently the import-tw takes a taskwarrior JSON export on stdin and does a lossy conversion. We could do something similar with the github tasks, though part of me thinks that this should be separate / beyond the scope of dstask. If we decide it is beyond the scope, perhaps it should be a separate project that uses dstask as a library to manipulate the yaml files.

. how to unify properties coming from GH that may have been locally edited.

This is the main problem, and the reason I state that dstask is not a collaboration tool; we'd have similar problems with conflict resolution and sync. That's why I might want to treat it as a one-time import (or export). A compromise may be to allow multiple imports of the same GitHub issues, idempotently but with precedence to github so local tasks are replaced with the github state or perhaps not updated if they already exist. This would fit a workflow I use at work where I (currently manually) link Jira issues.

It is quite possible that bidirectional sync would make sense (e.g. on unpausing or finshing local tasks would update the GH issue),

Plus there are issues with syncing the comments, we have one markdown page with dstask, not a list of comments.

i just don't have a strong need for this, so i left it out for now.

I think that's a good reason not to implement sync. Every feature should have a real use case.

Dieterbe commented 3 years ago

let's discuss further in our call (#90 )

Dieterbe commented 3 years ago

@naggie @dontlaugh I think this is in pretty good shape, at least for an initial version. For reviewing, I suggest you start with the documentation page which explains how everything works. https://github.com/naggie/dstask/blob/6b6abf46592bbb14d17e9c8ea8b9c918a8cbea7a/doc/dstask-sync.md

dontlaugh commented 3 years ago

I would advocate a general naming of plugin binaries that adopts the following convention:

dstask-${plugin-name}

The expectation can be that this is simply somewhere on your PATH. This is the convention for terraform, and I like it a lot.

This lends itself to a pluggable subcommand approach.

There have been requests to keep config out of HOME, and I agree. See https://github.com/naggie/dstask/issues/49

We might consider a convention like

$HOME/.config/dstask
    dstask.conf     # main config doesn't exist...yet
    /plugins
        /github-sync
            config.toml
        /something-else
            config.json

I think the same "/plugins/" namespacing would be good for the task database itself, e.g.

# the dstask- prefix is omitted?
~/.dstask/plugins/github-sync/default.yml

this way we can adopt a general approach in dstask core: the "plugins" subdirectory will be checked into git, but the reading/writing of this subdirectory is not the responsibility of core.

The previous comments would lead me to ask your opinion on the following restructure.

plugins/
  dstask-github-sync
    main.go

or

plugins/
   dstask-github-sync/
   pkg.go
   cmd/dstask-github-sync
     main.go

Or similar.

dontlaugh commented 3 years ago

You will need a token for your github account. Go here: https://github.com/settings/tokens

To test:

git remote add dieter git@github.com:Dieterbe/dstask.git
git fetch dieter
git checkout tasksync
go build -o dstask-sync cmd/dstask-sync/main.go

configure plugin:

cat << EOF > $HOME/.dstask-sync.toml
[[github]]
token = "<Github API token>"
user = "<Github org/user>"
repo = "<Github repository>"
get_closed = true             # get closed tickets in addition to open ones?
assignee = ""                 # if set, only import tickets that have this assignee
milestone = ""                # if set, select only tickets that have this milestone
labels = ""                   # if set, only select tickets that have these labels
template = "default"          # must be set to a valid task file in ~/.dstask/templates-github/<filename>
EOF

Add template to dstask directory

mkdir $HOME/.dstask/templates-github

cat << EOF > $HOME/.dstask/templates-github/default.yml
summary: "GH/{{.User}}/{{.Repo}}/{{.Number}}: {{.Title}}"
tags: ["{{.Milestone}}", "a-tag"]
project: "some-project"
priority: P2
notes: "url: {{.Url}}"
EOF

Add your GH secrets to the plugin config

vim  $HOME/.dstask-sync.toml

naggie commented 3 years ago

Reviewing the comments:

I agree we should respect XDG config dirs. Not sure we need hierarchy beyond a single level though, ~/.config/dstask/sync.yaml alongside the core config maybe? (should the core config be called dstask/core/main?) Also maybe yaml is a more natural choice considering it's used already for the database. Not too bothered though if you feel strongly. It's more important that we pick one and stick with it for all config files.
As for the plugin hierarchy, I'll defer it to you @Dieterbe -- I'm happy so long as it's kept separate from the rest of the code (core) and isn't too complex
Do we want to call them plugins? They're quite separable. Maybe they're better known as extensions? That's consistent with taskwarrior/timewarrior.

Some thoughts:

Do we want to implicitly map a tag or project?
Now reading the extensive README. Wow, nice documentation. Thanks for going to the effort there, I think documentation makes or breaks projects like this. I see there's a template that includes the URL by default. That's good, so it will work with the open command out of the box.
The template -- I suggest it's part of the config file (though with toml it's possibly messy, so might be a reason to use yaml as a block). It could be a const in the code as normal by default. I am very inclined to have it non-configurable though, with sensible defaults and only making it configurable if there's need.

dontlaugh commented 3 years ago

We had a chat about this in Slack regarding the possibility of embedding the template directly in the config. This could work:

template_str = """
summary: "GH/{{.User}}/{{.Repo}}/{{.Number}}: {{.Title}}"
tags: ["{{.Milestone}}", "a-tag"]
project: "some-project"
priority: P2
notes: "url: {{.Url}}"`
"""

Dieterbe commented 3 years ago

1) I'm okay with adopting xdg. but it entails more than "just use ~/.config and ~/.local/share", those are merely the default values and actually we should check xdg environment variables. there's probably a small lib out there to do that. and then there's the question what to do on macs. and also we should make this change not just for dstask-sync but also dstask itself. i suggest to address this in a different PR (either before or after merging this one)

2) re plugins and config, i really think it's too early to try to pin down some of these decisions. we don't know yet what different kinds of hooks, plugins, and alternative commands we want to build. and all have different needs. let's just go with something, and revise later. XDG_CONFIG_HOME/dstask/main.ext and XDG_CONFIG_HOME/dstask/sync.ext sound good to me (for now). regarding the extension, i think toml is a tad more better suited for config than yaml is, because we don't really need hierarchy in the config (at this point), the most advanced thing we need is to be able to add "multiple things of the same kind" (here github blocks), which toml solves well enough. I don't think it's a particularly useful concept for the markup of the task files to be the same of the config, though I suppose I see some benefits in that users would "only need to know 1 language instead of 2", but in reality, all of our users are well off knowing both anyway if they want to be successful with computers. But i don't feel strongly about this either, and we can revise once we build more tools and plugins where we discover more what our actual needs are.

3) code hierarchy: i'm pretty happy with how i put things in this PR. cmd//main.go for binaries and packages in pkg. I imagine this might be contentious, but i don't think it's such a bad thing to mix libraries for dstask-sync with those dstask itself. well, "mix" is relative, as they would still be separated packages, in their own directories, but intermixed with packages for dstask (i would like to see those go under pkg/ as well)

4) re plugins vs extensions terminology. for now i just think of it as a separate program. plugins/extensions IMHO are more like hooks, thinks that plug into the main dstask program and execute logic e.g. before/after updating a task. So let's keep those names available for when we start doing stuff like that. Potentially the sync stuff could become more like a plugin (e.g. call sync functionality automatically everytime dstask is run), but let's see how that evolves...

I think documentation makes or breaks projects like this.

yep, this is the lowest hanging fruit for this project IMHO. we also need better getting started docs etc

Do we want to implicitly map a tag or project?

not sure what this means. but i imagine some people may want to have tags/projects in dstask that are literal copies of tags/projects set on the respective GH issues. This is currently not supported because I didn't need it. I use different terminology in my dstask than what we use at work to categorize issues.

The template -- I suggest it's part of the config file (though with toml it's possibly messy, so might be a reason to use yaml as a block). It could be a const in the code as normal by default. I am very inclined to have it non-configurable though, with sensible defaults and only making it configurable if there's need.

this is where the rubber meets the road when it comes to "do we want to be an opinionated tool or not". in fact my first version of this didn't have template support, it generated the fields in a hardcoded way (similar to the current default template), but i figured people will probably want to customize this process, or at least they would want to set particular tags and project

using the url as note seems reasonable, as wel as the summary string GH/user/repo/number: <title>, these maybe don't require to be templates as i have them now. (or at least not user configurable templates). typically being opinionated means enforcing a certain schema or standard, but i don't see much value in us enforcing how task summaries, notes, etc look like. Unless maybe if at some point we want to parse them back to generate reports or something. Please let me know if you want me to make this less configurable.

Dieterbe commented 3 years ago

other todo's i want to do: 1) mention token getting https://github.com/settings/tokens 2) simplify org/user+ repo, and just use "user/repo" strings 3) allow specifying multiple "user/repo" pairs in each section (when you want to query many repos, potentially from different user/orgs, but apply the same settings to them)

naggie commented 3 years ago

1. I'm okay with adopting xdg. but it entails more than "just use ~/.config and ~/.local/share", those are merely the default values and actually we should check xdg environment variables.  there's probably a small lib out there to do that. and then there's the question what to do on macs. and also we should make this change not just for dstask-sync but also dstask itself. i suggest to address this in a different PR (either before or after merging this one)

Yes, though dstask has no config currently. Unless you mean the state cache? In which case, yes a separate PR. Else for the PR we can do dstask-importer config only, let's stick to something flat and change if necessary. I suggest:

~/.config/dstask/dstask-importer.toml
~/.config/dstask/dstask.toml (for a later PR when we need it)

The convention being a direct map to binary names. I think that's the most obvious.

2. re plugins and config, i really think it's too early to try to pin down some of these decisions. we don't know yet what different kinds of hooks, plugins, and alternative commands we want to build.  and all have different needs. let's just go with something, and revise later. XDG_CONFIG_HOME/dstask/main.ext and XDG_CONFIG_HOME/dstask/sync.ext sound good to me (for now). regarding the extension, i think toml is a tad more better suited for config than yaml is, because we don't really need hierarchy in the config (at this point), the most advanced thing we need is to be able to add "multiple things of the same kind" (here github blocks), which toml solves well enough. I don't think it's a particularly useful concept for the markup of the task files to be the same of the config, though I suppose I see some benefits in that users would "only need to know 1 language instead of 2", but in reality, all of our users are well off knowing both anyway if they want to be successful with computers.  But i don't feel strongly about this either, and we can revise once we build more tools and plugins where we discover more what our actual needs are.

I marginally prefer yaml but don't care enough to change things. Toml is fine. Are these .ext files binaries? Can't we just use separate commands?

3. code hierarchy: i'm pretty happy with how i put things in this PR. cmd//main.go for binaries and packages in pkg. I imagine this might be contentious, but i don't think it's such a bad thing to mix libraries for dstask-sync with those dstask itself. well, "mix" is relative, as they would still be separated packages, in their own directories, but intermixed with packages for dstask (i would like to see those go under pkg/ as well)

Sure, let's see how that goes sounds fine.

4. re plugins vs extensions terminology. for now i just think of it as a separate program. plugins/extensions IMHO are more like hooks, thinks that plug into the main dstask program and execute logic e.g. before/after updating a task. So let's keep those names available for when we start doing stuff like that. Potentially the sync stuff could become more like a plugin (e.g. call sync functionality automatically everytime dstask is run), but let's see how that evolves...

Yes, separate binaries for now seems fine. I don't think hooking in a sync is necessarily a good idea, I'd rather importing be an explicit decision by the user.

I think documentation makes or breaks projects like this.

yep, this is the lowest hanging fruit for this project IMHO. we also need better getting started docs etc

Do we want to implicitly map a tag or project?

not sure what this means. but i imagine some people may want to have tags/projects in dstask that are literal copies of tags/projects set on the respective GH issues. This is currently not supported because I didn't need it. I use different terminology in my dstask than what we use at work to categorize issues.

Yes that's what I meant. Manually configuring a context per repo is also fine, and possibly necessary -- now I see you've done that in the template. For some reason I had assume that was a note template; my fault for skim reading.

this is where the rubber meets the road when it comes to "do we want to be an opinionated tool or not". in fact my first version of this didn't have template support, it generated the fields in a hardcoded way (similar to the current default template), but i figured people will probably want to customize this process, or at least they would want to set particular tags and project

I now agree that we need the template, as the template's job is to define a default tag/project/priority.

using the url as note seems reasonable, as wel as the summary string GH/user/repo/number: <title>, these maybe don't require to be templates as i have them now. (or at least not user configurable templates). typically being opinionated means enforcing a certain schema or standard, but i don't see much value in us enforcing how task summaries, notes, etc look like. Unless maybe if at some point we want to parse them back to generate reports or something. Please let me know if you want me to make this less configurable.

I don't think we'll want to parse back -- if we did it would be to sync bidirectionally, and that's asking for trouble.

I had assumed we'd map the title 1:1 and rely on project/tags for other context. Adding the information GH/user/repo/number to the title would add to the noise in my opinion. If your workflow requires this then I think we've just demonstrated that this part needs to be configurable.

Now giving it a go!

A few other thoughts:

One potential issue is confusion between dstask-sync and dstask sync. I think it needs another name. dstask-importer as it's importing issues, albeit in an idempotent way that supports updates. That raises the question, does import-tw belong in this binary?
logrus, nice library but is it necessary?

naggie commented 3 years ago

OK, I got it working after removing the invalid line from the example config, labels = "" -- the parser was expecting a list of strings instead of an empty string.

It's really cool! Nice work.

Dieterbe commented 3 years ago

Todo for this PR

[x] rename to dstask-import, rename config accordingly (leave xdg for future PR)
[x] simplify config to "user/repo" strings and allow setting multiple per section
[x] move template definitions into github config sections (separate files and n:m mapping seems overkill at this point)
[x] document needed github tokens from https://github.com/settings/tokens
[x] fix invalid labels line in sample template.

Postponed for future work

figure out what log library we want to go forward with
adopt xdg
move tw-import functionality into dstask-import
support for importing PR's
more integrations

Dieterbe commented 3 years ago

I had assumed we'd map the title 1:1 and rely on project/tags for other context. Adding the information GH/user/repo/number to the title would add to the noise in my opinion. If your workflow requires this then I think we've just demonstrated that this part needs to be configurable.

So, at work I'm really involved in only 1 or 2 real projects. However, each of these involves a plethora of repositories (across different organisations no less). Titles alone would be too ambiguous as it would be unclear which repository they refer to. So I need to know the org and repo as well. using each individual repo to create a new project would be much too finegrained for me because any single "project" for me is a mixture of different tasks across different repos and that's how i want to see it.

Perhaps my use case is the less common case, we could simplify the default template (whilst still accommodating my needs because i can set my personal config however i like), but then again, the default template is also a useful demonstration of the kind of stuff that is possible.

naggie commented 3 years ago

I think that opinion of mine is old, I'm now convinced by the template mechanism. Is a template available by default or is it currently necessary in the config?

I'll review and merge tonight with any luck

Dieterbe commented 3 years ago

currently we require the user to provide a template. should be easy to copy paste from the docs (current version) if no template is defined, we generate essentially empty tasks without titles, tags, notes etc (but correctly generated UUID's)

naggie commented 3 years ago

currently we require the user to provide a template. should be easy to copy paste from the docs (current version) if no template is defined, we generate essentially empty tasks without titles, tags, notes etc (but correctly generated UUID's)

Hm, I think that should change. I think template_str should be optional, defaulting to a sane constant. Perhaps also dstask-importer could offer to generate an example config on first run. I think reducing the steps to get running is important so new users go from zero to working asap. Not necessarily part of this PR though.

Dieterbe commented 3 years ago

Hm, I think that should change. I think template_str should be optional, defaulting to a sane constant. Perhaps also dstask-importer could offer to generate an example config on first run. I think reducing the steps to get running is important so new users go from zero to working asap. Not necessarily part of this PR though.

I agree that reducing steps to go from zero to working is a good goal, but I'm not convinced making template_str optional is the way (or even "a way") to do it:

1) whether we instruct the user to copy paste a config block into a config file, or do it automatically for them, omitting a template and relying on a built-in default does not affect this step or the number of steps. 2) having a template defined explicitly, ready for editing or copy pasting makes things simpler/easier rather than not having the section and having to find it from elsewhere when you want to override the default. 3) i don't think we can find a default that makes sense for most people. Users should choose project names, tags and priorities that make sense for them and we should encourage them to do so by putting the template front and center.

(fun fact: I actually implemented a default, but didn't like it, so omitted it from the PR)

Edit: thinking about it more, seems we should just error out when template_str is not set or empty. Same for token and repos, actually

naggie commented 3 years ago

All good, thanks @Dieterbe and also @dontlaugh.

Somewhat reluctant to use logrus, but only marginally. I'll keep an open mind -- seems more appropriate for the importer than the main core binary; which is where it is so that's fine.

naggie / dstask

dstask-import: import GitHub issues (and in the future more) to dstask #81

Todo for this PR

Postponed for future work