se-sic / coronet

coronet – the R library for configurable and reproducible construction of developer networks
GNU General Public License v2.0
7 stars 15 forks source link

Create edges for `artifact relation` `issue` #239

Closed bockthom closed 7 months ago

bockthom commented 1 year ago

Through our checks that we made during working on #238, we noticed that artifact networks with artifact.relation issue don't have edges though we have the data to create them. Hence, creating the edges among the issues when they reference each other (as described in our README.md, see Section "Relation") could be a useful enhancement.

maxloeffler commented 1 year ago

Hey, as I can see, all the information we can get about issues that are data.vertices, issue.id, issue.title, issue.type, issue.state, issue.resolution, creation.date, closing.date, issue.components, event.name, author.name, author.email, date, event.info.1, event.info.2, event.id, issue.source and artifact.type.

What does it mean for issues to "reference each other" as stated in README.md?

bockthom commented 1 year ago

There are multiple issue events (for all the possible issue events, see the attached file issue_data_processing.pdf). For example, there are events that indicate that one issue was referenced by another issue (and vice versa). The event.type (or event.name, not sure how we call it in coronet) as well as the event.info.1 and event.info.2 contain the necessary information.

For example, when somebody references another issue in an issue, then we can see this in an add_link event where event.info.2 is "issue". The issue number of the referenced issue is given in event.info.1 in this case (we might need to check whether the referenced issue in the same repo, but for now, let's just assume that this is the case).

We also have the opposite direction: There is a referenced_by event in the other issue. In theory, this relationship should be reciprocal (but we have never checked that). So, we should only generate one edge for add_link and referenced_by, not two, as both events describe the same activity.

For the implementation, you can do it in a similar way than for the already existing networks, that is, call group.artifacts.by.data.column with appropriate parameters, but the difficulty here is to filter for the needed event.name and for event.info.2 == "issue".

I hope this description is helpful. If not, don't hesitate to ask additional questions :wink: