thoth-station / mi

an experiment on Source Operation Metrics
GNU General Public License v3.0
6 stars 10 forks source link

Timestamps not accurately captured for issues/prs #555

Closed hemajv closed 2 years ago

hemajv commented 2 years ago

Bug description

The timestamps for created_at and closed_at fields for Issues/PRs data collected using the srcopsmetrics module are not accurate.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Run !python -m srcopsmetrics.cli -clr open-services-group/byon -e Issue,PullRequest
  2. Store as a df:
    issue = Issue(open-services-group/byon)
    issue_df = issue.load_previous_knowledge(is_local=True)
    issue_df = issue_df.reset_index()
    issue_df.head()
  3. View the created_at and closed_at columns and see that it does not match as seen in the GitHub web UI

Actual behavior

created_at field wrong for some of the issues/prs created and null values for the closed_at field even though the issue/pr is closed.

Expected behavior

Accurate timestamps are captured

Additional context

Screenshot from 2022-03-24 12-02-26

The above screenshot is captured for the https://github.com/open-services-group/byon/issues?q=is%3Aissue+is%3Aclosed repo. For example the created_at field has a value of 1970-01-20 for issue no. 34 in the repo which is wrong. Also, the issue no. 26 is closed in the repo, but the closed_at field is null for this issue.

hemajv commented 2 years ago

cc @xtuchyna

xtuchyna commented 2 years ago

/assign @xtuchyna

xtuchyna commented 2 years ago

Hey @hemajv , cannot reproduce the issue, the data on my side seems valid:

>>> data[["created_at", "closed_at"]]

            created_at           closed_at
id                                        
34 2022-03-23 15:13:15                 NaT
32 2022-03-23 12:01:03                 NaT
31 2022-03-23 11:58:34 2022-03-23 13:16:45
30 2022-03-23 10:57:20 2022-03-23 12:48:23
28 2022-03-08 21:06:00                 NaT
27 2022-03-01 12:49:51                 NaT
26 2022-03-01 12:38:34 2022-03-22 17:32:11
25 2022-02-28 14:48:45 2022-03-08 14:07:16
23 2022-02-16 10:44:47 2022-03-09 12:55:04
20 2022-02-08 10:33:26                 NaT
19 2022-02-07 08:06:14                 NaT
18 2022-02-02 12:33:15 2022-03-23 12:48:23
17 2022-01-31 17:17:53                 NaT
16 2022-01-19 20:47:01                 NaT
15 2022-01-19 13:40:42                 NaT
14 2022-01-17 06:32:03                 NaT
13 2022-01-17 06:25:53 2022-03-01 12:53:57
12 2022-01-17 06:23:22                 NaT
11 2022-01-17 06:19:40 2022-03-02 14:14:46
10 2022-01-14 10:13:07                 NaT
7  2022-01-13 13:37:18                 NaT
6  2022-01-12 07:52:29                 NaT
5  2022-01-12 07:50:03                 NaT
4  2022-01-12 07:43:55                 NaT
3  2022-01-12 07:37:04                 NaT
2  2022-01-12 07:22:00                 NaT
1  2022-01-12 07:11:10                 NaT

What specific Issue did you import? Was it

from srcopsmetrics.entities.issue import Issue

or

from srcopsmetrics.entities.raw_issue import RawIssue

?

hemajv commented 2 years ago

@xtuchyna I used from srcopsmetrics.entities.issue import Issue, and I still see my df having invalid dates :/ is there something else Im missing? I am running this on the smaug jupyterhub instance using the standard data science image

hemajv commented 2 years ago

@xtuchyna okay so I deleted the srcopmetrics folder that gets created and re-ran it. The dates are correct now :tada: Perhaps the srcopmetrics folder wasn't getting updated with the new data and hence I was seeing the old values.

goern commented 2 years ago

what is the status of this?

hemajv commented 2 years ago

what is the status of this?

we can close this out! its now resolved :+1: