jgehrcke / github-repo-stats

GitHub Action for advanced repository traffic analysis and reporting
Apache License 2.0
302 stars 41 forks source link

Action fails when too many jobs trying to track different repos in the same data repo #11

Closed ChameleonTartu closed 3 years ago

ChameleonTartu commented 3 years ago

This project looks amazing!

My idea was to track all public repos and analyze them once in a while. It looks like when I have too many jobs running, the action fails. For instance, when one job is pushed before another one. My GitHub repo.

Also, there is another issue with amazon-mws-subscriptions-maven:

210411-19:09:08.177 INFO:MainThread: union-merge views and clones
Traceback (most recent call last):
  File "/fetch.py", line 314, in <module>
    main()
  File "/fetch.py", line 73, in main
    ) = fetch_all_traffic_api_endpoints(repo)
  File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
    df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat
    op = _Concatenator(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 467, in __init__
    self.new_axes = self._get_new_axes()
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 537, in _get_new_axes
    return [
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 538, in <listcomp>
    self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 544, in _get_comb_axis
    return get_objs_combined_axis(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 92, in get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect, sort=sort, copy=copy)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 145, in _get_combined_index
    index = union_indexes(indexes, sort=sort)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 214, in union_indexes
    return result.union_many(indexes[1:])
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 395, in union_many
    this, other = this._maybe_utc_convert(other)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 413, in _maybe_utc_convert
    raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex")
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex

Another data frame issue:

210411-19:09:18.943 INFO: parsed timestamp from path: 2021-04-11 19:09:15+00:00
Traceback (most recent call last):
  File "/analyze.py", line 1398, in <module>
    main()
  File "/analyze.py", line 82, in main
    analyse_view_clones_ts_fragments()
  File "/analyze.py", line 691, in analyse_view_clones_ts_fragments
    if df.index.max() > snapshot_time:
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
+ ANALYZE_ECODE=1
error: analyze.py returned with code 1 -- exit.

Git clone issue:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

All other issues are the same as those mentioned.

ChameleonTartu commented 3 years ago

@jgehrcke Let me know if I can help more than just reporting this. It would be great to fix all of this, to use this tool more extensively, as I am planning to grow the number of repos from 34 to more over time. It is the most valuable tool, I could find for tracking repo development over time. Thank you again!

jgehrcke commented 3 years ago
Traceback (most recent call last):
  File "/fetch.py", line 314, in <module>
    main()
  File "/fetch.py", line 73, in main
    ) = fetch_all_traffic_api_endpoints(repo)
  File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
    df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
[...]
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex

I could not quite make sense of this one. Both, df_clones and df_views are created by the same code path. I thought maybe when one of both is empty this might be the fallout with a misleading error, but no:

± python
iPython 3.8.6 (default, Nov 22 2020, 17:14:35) 
[GCC 10.2.1 20201016 (Red Hat 10.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')
>>> df_aware = pd.DataFrame(data={'lol': [1, 2, 3]}, index=tz_aware)
>>> df_aware
                           lol
2018-03-01 09:00:00-05:00    1
2018-03-02 09:00:00-05:00    2
2018-03-03 09:00:00-05:00    3
>>> df_empty = pd.DataFrame(data={}, index=[])
>>> pd.concat([df_aware, df_empty], axis=1, join="outer")
                           lol
2018-03-01 09:00:00-05:00    1
2018-03-02 09:00:00-05:00    2
2018-03-03 09:00:00-05:00    3

I am adding a patch that changes the way the DatetimeIndex is translated to a tz-aware object, which hopefully addresses this problem. It's a little disappointing to not understand it precisely.

jgehrcke commented 3 years ago

TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'

That somewhat suggests that df_clones and df_views looked rather differently structurally than what's expected.

Update: empty index explains that error msg:

>>> df_empty.index.max() > datetime(year=2012, month=3, day=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
jgehrcke commented 3 years ago
GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

Could it be that this token was actually truncated and/or maybe this is related to one of your code changes?

I notice secrets.ACCESS_GITHUB_API_TOKEN but with current code this should actually look very differently:

git clone https://ghactions:${GHRS_GITHUB_API_TOKEN}@github.com/${DATA_REPOSPEC}.git 

When things work as expected, that should be the log pattern:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone ***github.com/jgehrcke/ghrs-test.git .
length of API TOKEN: 40
Cloning into '.'...

It's likely that the error message fatal: Too many arguments. was as of the misconstructed git clone ... command.

jgehrcke commented 3 years ago

@ChameleonTartu would you mind retrying things with the current head of main? I think I've addressed all issued reported to date (maybe have a look at the changelog). Happy to cut a release, but ideally only after getting your confirmation that things indeed work.

ChameleonTartu commented 3 years ago

@jgehrcke I made a run: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748508227

The only use-case that doesn't work is:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

And all jobs failed with the same message: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/runs/2343584927?check_suite_focus=true

I suspect that repos may have been created a long time ago, so they have different API token formats, can it be the cause? Any idea?

jgehrcke commented 3 years ago

The only use-case that doesn't work is:

OK, you're workflow file is bad in a subtle way! Mean trap: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/blob/b6d089f2bc01462e05fe8100ce1f27cfd3a24909/.github/workflows/stats.yml#L138

@ChameleonTartu you have ghtoken: ${ secrets.ACCESS_GITHUB_API_TOKEN }, but these curly braces need to be pairs of them: ${{ ... }} -- in most jobs, you have that.

ChameleonTartu commented 3 years ago

@jgehrcke Thank you! I didn't notice these nuances.

I auto-generated some of the jobs, so it looks like I got some of them wrong. Cool-cool-cool!

jgehrcke commented 3 years ago

@ChameleonTartu ok : ) Please leave feedback again when the current head of main worked for all your jobs : )

ChameleonTartu commented 3 years ago

@jgehrcke Everything works smoothly: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748788117

Amazing!