Open kikass13 opened 4 years ago
Here's my first example of file specific info gathering. I am using git blame to extract all the (hopefully) useful information of local files and their "touches" (aka who changed how many lines at which point in time). My script + git blame outputs the following:
heres my example:
git ls-files | while read f; do echo "\n$f"; git blame -CCC --line-porcelain $f | tests/pythonGitHelper.py; done
and here's the output
```
.git-blame-ignore-revs
Arne Döring [2]
-- 2020-08-14/16:37:25 [2]
.github/FUNDING.yml
Tobias Augspurger [3]
-- 2020-08-20/15:19:05 [1]
-- 2020-08-20/14:56:17 [2]
.github/workflows/black.yml
Arne Döring [11]
-- 2020-08-14/16:08:49 [1]
-- 2020-08-14/14:34:01 [10]
.github/workflows/seleryaction.yml
Tobias Augspurger [44]
-- 2020-02-08/11:23:48 [7]
-- 2020-08-22/18:35:19 [1]
-- 2020-07-16/16:10:23 [4]
-- 2020-08-22/18:40:54 [2]
-- 2020-02-08/11:20:39 [6]
-- 2020-08-20/12:01:43 [2]
-- 2020-08-11/22:29:39 [7]
-- 2020-08-13/18:20:38 [8]
-- 2020-02-08/11:20:04 [3]
-- 2020-08-19/09:26:19 [3]
-- 2020-08-22/18:48:52 [1]
T0b14s Augspurger [71]
-- 2020-07-23/15:59:06 [2]
-- 2020-07-22/14:16:37 [1]
-- 2020-02-08/11:23:48 [2]
-- 2020-07-17/17:48:09 [1]
-- 2020-02-08/11:20:39 [3]
-- 2020-03-28/08:33:31 [9]
-- 2020-07-23/14:14:00 [9]
-- 2020-07-23/16:12:10 [3]
-- 2020-07-18/09:12:25 [29]
-- 2020-07-23/17:30:09 [1]
-- 2020-03-08/18:35:34 [10]
-- 2020-07-23/16:19:38 [1]
johannes karoff [1]
-- 2020-07-31/16:55:59 [1]
.gitignore
Arne Döring [3]
-- 2020-08-17/15:52:19 [3]
Nick Fiege [3]
-- 2020-02-08/11:23:48 [3]
Tobias Augspurger [120]
-- 2020-03-15/09:39:56 [3]
-- 2020-02-08/11:20:39 [3]
-- 2020-02-08/11:23:48 [1]
-- 2020-02-29/09:45:42 [1]
-- 2020-08-19/18:01:33 [3]
-- 2020-02-10/22:39:29 [1]
-- 2020-02-08/11:20:04 [108]
johannes karoff [1]
-- 2020-07-22/16:24:14 [1]
Dockerfile
T0b14s Augspurger [3]
-- 2020-02-28/22:24:07 [1]
-- 2020-02-28/21:43:44 [2]
Tobias Augspurger [25]
-- 2020-02-24/22:10:17 [4]
-- 2020-02-28/23:47:20 [2]
-- 2020-02-08/11:20:04 [14]
-- 2020-02-08/11:20:39 [5]
kikass13 [20]
-- 2020-08-06/22:38:34 [20]
Gemfile
Hendrik Radke [1]
-- 2020-03-21/15:52:01 [1]
Tobias Augspurger [2]
-- 2020-02-08/11:20:04 [1]
-- 2020-02-24/22:10:17 [1]
LICENSE
Tobias Augspurger [661]
-- 2020-02-08/11:20:04 [661]
README.md
Arne Döring [6]
-- 2020-08-16/19:19:06 [6]
T0b14s Augspurger [12]
-- 2020-03-22/08:40:18 [2]
-- 2020-02-16/14:02:36 [1]
-- 2020-07-03/00:02:46 [1]
-- 2020-03-21/09:04:05 [1]
-- 2020-02-08/11:20:39 [3]
-- 2020-07-26/18:57:52 [2]
-- 2020-02-24/20:33:17 [2]
Hendrik Radke [1]
-- 2020-03-21/15:52:01 [1]
Felix Dietze [29]
-- 2020-08-21/19:51:51 [29]
Tobias Augspurger [122]
-- 2020-08-14/17:05:22 [2]
-- 2020-08-17/08:38:30 [1]
-- 2020-08-14/20:27:52 [3]
-- 2020-08-22/08:50:51 [11]
-- 2020-02-08/11:20:04 [3]
-- 2020-08-14/15:01:23 [3]
-- 2020-08-06/10:57:33 [1]
-- 2020-08-22/18:25:36 [1]
-- 2020-08-15/10:33:53 [15]
-- 2020-08-16/12:32:28 [7]
-- 2020-02-24/22:29:28 [1]
-- 2020-02-08/11:23:48 [2]
-- 2020-08-14/23:22:56 [1]
-- 2020-08-16/11:23:35 [9]
-- 2020-08-16/11:00:25 [9]
-- 2020-08-13/14:59:30 [1]
-- 2020-08-22/18:10:53 [3]
-- 2020-08-19/17:46:07 [22]
-- 2020-08-15/12:25:25 [4]
-- 2020-08-21/12:31:53 [8]
-- 2020-08-22/18:29:35 [1]
-- 2020-03-14/13:28:49 [14]
build.sh
kikass13 [1]
-- 2020-02-08/11:23:48 [1]
johannes karoff [1]
-- 2020-07-31/16:54:50 [1]
docs/OpenSelery-04.png
Traceback (most recent call last):
File "tests/pythonGitHelper.py", line 15, in
for now git blame fu** up when dealing with binary files (which is not unheard of). And apparently while testing this I've had non-committed files inside my directory, meh :)
Nice @kikass13 I think it is a really good approach. I went through the git blame and it is indeed a good indicator.
will replace/takeover the current gather/weight/split functionalities based on a dynamic / freely configurable framework
When you replace / takeover the existing architecture try to keep the existing functionality or even enhance it. The uniform weights and activity weights are quite important even if they are not that complex. I will today start to build some demo script to get into the coordination weights. I think the names file weights and coordination weights are quite good. @krux02 @cornerman @fdietze What is your opinion?
@Ly0n well these are not "weights" per se . These are just a classifier needed for someone to configure what he wants to express ...
to make my thought process clear:
as you can see, I am a little confused about how metrics play their role here. I don't really know (right know I don't even have a slight clue) how we will configure, declare & apply metrics to a contribution domain. If someone has an idea, please give me some insight
To document what I did the last two days, here's a diagram depicting the data flow: I will describe the image below ... just for curious people, flow starts at the top left side ;)
LibreSeleryConfig
classLibreSelery
class which will initialize itself properly, check things and prepare output dirs and online connection (from various sources)CDE
(ContributionDistributionEngine
)
ContributionDomain
)
weight
, which identifies how important that specific domain is in relation to other domains
ContributionAction
)
type
attribute, which identifies which plugin (ContributionActionPlugin
) will be executed to do all the busy work
LibreSelery
has to initialize the CDE
ContributionDomain
ContributionAction
objects for each domain ContributionAction
will also initialize and load it's configured plugin
ContributionActionPlugin
objects are essentially external python code files. CDE
can work with them properly but their user code can be altered to fulfill all kinds of tasks. CDE
is ready and can finally start working? ...cde.gather()
domain.gather()
action.gather()
cde.weight()
domain.weight()
domain.weight()
will also call domain.mangle()
, which in turn will add up the contributor scores of all actions_domain.weight()
will just normalize the scores of all contributors from [0 ... to 1], these are not considered scores anymore, but are considered weights insteadcde.merge()
weight
parameter will also be applied, which in return will reduce the impact of specific domains in relation to others any questions? No ? im going to bed now!
I am wondering about one additional aspect: historical perspective. Initial figure shows use of git log
but I have not spotted it in further discussions. I think that may be useful to add "historical decay rule" (also configurable -- faster decay coefficient would accent on most recent states/contributions)
Then that "combined" split is what would be used to decide on how/whom to split current funds allotment.
@yarikoptic We talked about a concept (which I did also mention in my examples somewhere) called "time degradation". I guess it's the same thing you mean. I like the idea of including time (absolute and differential) as a means to empower "fresher" contributions.
Me and other folks talked a little bit about it in here:
The example i coded (for the CDE) which is currently free for review and further improvements (see my fork here: https://github.com/kikass13/libreselery/tree/cd_engine) includes a plugin based scoring system (small example) of git blame.
It gathers
So with that plugin, it is technically possible to score newer contributions better than older ones. That's just an example though as the concept of "time" is a difficult one to configure properly.
In case you have any suggestions or want to help me putting a little example of what you said into code, I would be happy to get some help <3
Meeting Note: We should name the "Actions" the image "Activities" because "Actions" is already been used by Github Actions.
Update from commit 31601a4:
I changed some of the internal stuff, bit the most important thing is that there is a plugin which does the same as the previous gather
and weight()
functions. It is not identical and a lot of stuff is missing. But the flow works well now and can be altered to fit whatever was before
cde now does all the heavy lifting, while plugins do all the crazy stuff (user defined , arbitrary code)
in case you want to look into it (@fdietze @cornerman) (my fork is here: https://github.com/kikass13/libreselery/tree/cd_engine)
After the successful little meeting with @cornerman and @fdietze I changed some of the internal behavior and cleaned up the code. The main talking points were:
It was decided that all bold formatted points are relevant prior to a first PR. The last commits should address all of these "bold" points :)
after our latest meeting, I was playing around with the future weighting stuff and some random git helper scripts regarding git blame.
I will continue the stuff I have visualized in my newest fancy draft - it's not really representing anything but it should show my intentions and definitions while moving forward.
SORRY FOR SPELLING MISTAKES IN THIS TEXT, IT's 3 am DAMMIT! :D
following definitions were used:
Contribution Type (probably 1:1 with metrics, but I'm not sure)
source of information regarding contributions done
metrics will have to classify as one of these types, simple metrics only fall into one of these while complex metrics will need their own type in respect to the information needed
Contribution Domain
These are just labels for whatever the user wants to be represented by metrics.
So domains can be seen as a combination of metrics for a specific purpose (distribution of money to a specific kind of developer type)
highly configurable and arbitrary ... the repository owner can pretty much define whatever he wants (weights, metrics, special flags)
Because specific metrics have to be "assigned" to a Contribution domain to work, these domains have to reference the contribution type above (where does the metric get it's information from)
how does it work: