Open jrfnl opened 2 years ago
@jrfnl May I re-process the jobs in your parallel build to make sure this is not a one-off issue? This will re-consume each of your coverage reports as they came in and re-perform your aggregate coverage calculation. If the results are the same, then we are on more solid footing trying to determine a root cause. Please let me know when you can. Thanks.
@afinetooth Please feel free, but I have seen the same issue with multiple PRs now, so it definitely isn't a one-off.
Some other examples:
https://github.com/PHPCSStandards/PHPCSUtils/pull/304 - Introduced a new class with 100% code coverage, so should have increased total code coverage with a very small percentage: shows a decrease in the GH report. Related PR build shows the same pattern as above: real total has increased, but number at the top of the build shows a decrease for the aggregate result: https://coveralls.io/builds/48166327
https://github.com/PHPCSStandards/PHPCSUtils/pull/305 - No code changes, just a change of the test tooling version. could have had a very small effect on code coverage, but definitely not a decrease of -8.8% Coveralls build here: https://coveralls.io/builds/48168408
@jrfnl some quick updates:
On this PR build, mentioned above: https://coveralls.io/builds/48166327
I re-ran the jobs in the PR build and got a different result—specifically, the small increase in coverage you'd expect from the underlying numbers: https://coveralls.io/builds/48166327
As for the next PR build mentioned above: https://coveralls.io/builds/48168408
I noticed a discrepancy in the parallel jobs listed for the build:
Whereas in the base build: https://coveralls.io/builds/48166746
We have this list of jobs:
In the related PR build: https://coveralls.io/builds/48168408
We have a different set.
At least the first job, 2148532125.1, appears to relate to a different PHP version (8.1
instead of 7.4
):
https://coveralls.io/builds/48168408
Finally, a similar, but different, sort of discrepancy is at play in the example PR build from your original post:
Whereas the first job in your list of jobs in the base build is for PHP 8.1
:
https://coveralls.io/builds/48211375
That build is listed second in the related PR build (and version 7.3
is listed first):
https://coveralls.io/builds/48212375
In other words, they've swapped places.
The potential issue: While the order in which jobs are received by the Coveralls API does not matter, the IDs of the jobs should be the same between builds.
In other words, Coveralls expects that Job <whatever>.1
in your PR build is the new version of Job <whatever>.1
from your base build.
Again, while those jobs may run in parallel fashion on your CI runner and arrive at coveralls.io in a different sequence due to timing, the index of the job (.1
, .2
, etc) is expected to refer to the same job for the purpose of comparison.
This is achieved by setting up your CI config in a consistent order.
In other words, it's expected that you're running through your matrix build in the same order each time, so that jobs for each individual scenario (ie. PHP version
) are consistently compared between builds.
Next steps / how to fix: I am concerned about the above, but the latter issue (job sequence) should be easy to fix by altering your CI config.
On the other hand, the former issue (different jobs between builds) is harder for me to understand or suggest a fix.
Can you give me some more insight into what's happening on the CI side?
Well, yes, one of the builds referenced above switched a PHP version (PHP 7.4 to 8.1), but it didn't change the order of the jobs, it was just a straight swop out.
As for the numeric suffixes (index): I don't pass those. I pass just and only the PHP and PHPCS version, that's it. Could it be that the PHP Coveralls package adds those ?
Other than that: I honestly don't understand why it would matter for the aggregate results whether the jobs line up. I mean, the whole point of running several parallel builds is for the coverage of all of those to get merged and calculated as a whole. I honestly don't care about the code coverage of the individual builds as those can never get to 100% anyway as there is code which is specific to certain PHP versions and code specific to certain PHPCS versions. It's only when those reports get merged that you get to see the real code coverage, which is why Coveralls is so useful.
This is the workflow I use (relevant parts highlighted) to generate the base input and upload to coveralls: https://github.com/PHPCSStandards/PHPCSUtils/blob/816ffcc701d6bbc89d3820eeaa3262e6dbbe60f9/.github/workflows/test.yml#L318-L341 and you can see the matrix near the top of the job.
I'll be making some more changes to the matrix soonish, but again, I'd expect that to have a positive impact on the aggregate result, not a negative impact because individual jobs don't line up...
It's the mismatch between the sum-total of code coverage as shown for the class tree vs the aggregate result in the build header/reported to GH which I don't get (the numbers pointed out by the arrows in the above screenshots). Those should always be the same.
@jrfnl, Ok, thanks for the update. I understand your concern better now, so I'll stay focused on the aggregate coverage and let the job-to-job comparison be secondary.
In fact, as I look closer at some of those cross-build jobs, I see that we actually seem to be tracking them by their flag-name
anyway, which is a capability of the Coveralls Github Action and the node-coveralls integration underlying it. So, in other words, I'm wrong about the job "sequence" numbers needing to match in this case.
So, re: aggregate coverage: As I mentioned, I re-ran the jobs for your original example and that corrected the issue where the aggregate coverage for the PR build seemed to have decreased, diverging from the result at the top of your file tree.
Specifically, this build: https://coveralls.io/builds/48166327
Is correct now (using the arrows you referred to above):
As for next two pairs of builds, which showed similar issues to the original:
Pair 1:
And:
Pair 2:
I am going to re-run the jobs for the PR Builds to see if it corrects the issue in the same way as the first example...
And it does:
Pair 1:
And:
Pair 2:
So this convinces me that Coveralls is not making an error in the calculation of your aggregate coverage, just that something is interfering with the way it is receiving the jobs from your original POSTs.
For instance, potentially, the calculation may be starting before the final job is received.
Or something else. At this point I'm really not clear.
I will dig a little deeper for a root cause, starting with your workflow file.
Speaking of which, upon review I'm not seeing the invocation of the Coveralls Github Action (coverallsapp/github-action@master
) in the steps of your matrix build, even though I do see it in your final step (closing the parallel build).
So, does that mean you're mixing two integrations?:
(I'm not sure that would be the issue, but it's atypical, which way warrant a closer look.)
To see if it clears the issue, we could run a quick test, without using the Coveralls Github Action.
That would mean sending the Parallel Build Webhook in your final step in the most generic fashion (using cURL
), like this:
steps:
- name: Coveralls Finished
env:
COVERALLS_REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: curl -k https://coveralls.io/api/v1/webhook?repo_token=${{ env.COVERALLS_REPO_TOKEN }} -d "payload[build_num]=${{ env.GITHUB_RUN_ID }}&payload[status]=done"
Excuse my syntax, which might not be 100% correct.
_You might be able to pass those vars as $COVERALLS_REPO_TOKEN
or ${COVERALLS_REPO_TOKEN}
, and $GITHUB_RUN_ID
or ${GITHUB_RUN_ID}
._
php-coveralls notes on parallel builds and sending the parallel build webhook, here.
So, does that mean you're mixing two integrations?:
- using php-coveralls to send your jobs,
- then using the Coveralls Github Action to close your build?
I suppose it does. Then again, I had a hell of a time getting parallel build merging to work in the first place when these projects made the move from Travis to GH Actions. The documentation on this is sparse, to say it in the most polite way possible.
I don't exactly remember how I ended up with this set up, but no doubt it was based on a combination of information found in official documentation from php-coveralls, Coveralls itself and the coveralls action runner, and reading up on other people have solved similar issues, like here.
I mean, the short of it is:
php-coveralls
package to transform the reports to a format Coveralls can handle.php-coveralls
package does not offer any way to send the "parallel builds finished" hook.I did a lot of research and trial and error runs around that time as I was moving the CI from some 40+ projects over to GH Actions. Also see earlier communications between us about this setup.
This set-up works - and this can also be seen in the aggregate of the develop
branch builds being correct (I haven't seen that being off yet at all). It's just weird that the aggregate for PR builds are regularly off.
Happy to try to send the webhook via Curl, but I'm not sure what difference that would make. I mean: the "finished" hook gets received correctly and in the GH Actions workflow, the coveralls-finish
job doesn't start until all builds in the coverage
job have finished - needs: coverage
-, so I don't see how the timing for sending the webhook could be the problem.
@jrfnl thanks for the update:
[...] The documentation on this is sparse, to say it in the most polite way possible.
Appreciate that. You're right. We've been working on a documentation update that's been long-coming, but we are close to releasing it.
As an aside, one caveat is that, because Coveralls integrations are third-party OSS projects, their docs won't be unified with ours and some confusion like this may persist. Our recent attempts to address this have gone toward a universal coverage reporter---in beta---that will allow all integrations to POST coverage to coveralls.io in a consistent manner. (At this time, however, the project only supports LCOV and Simplecov formats. The hope is that future OSS contributions will be toward adapters for new formats.)
I did a lot of research and trial and error [...] Also see earlier communication between us about this setup.
This set-up works - [...] It's just weird that the aggregate for PR builds are regularly off.
Happy to try to send the webhook via Curl, but I'm not sure what difference that would make.
Thanks for the background. I did just review our earlier communication and I recall what a challenge you faced in migrating your projects and I'm sorry about that. I understand you're at the end of a long effort that actually applies to many repos, so I didn't mean to be flip by suggesting a different way of sending the webhook.
TBH, I was reaching for a root cause and seizing on anything off-pattern that could be isolated and removed as a possible factor.
Having seen this type of behavior before, it's been my experience that, even if they're not technically "one-offs," they are usually temporary. The fact that we can re-run your jobs and get correct aggregate calculations indicates some sort of interference with that regular process that happens in real-time when you send your initial POSTs.
Without absolute proof, my best theory involves temporary environmental factors, like severe traffic or an API outage, where something like this is more likely to happen: a particular coverage job POST fails to succeed on its first try and is sent to a retry queue, but the webhook call succeeds before it and closes the build. Then the retry succeeds and adds a job to the parallel build, but the total results are not re-calculated because the close webhook (trigger) does not re-fire.
I wish I had not re-run all of your example builds, because then I could test this theory by checking timestamps, etc.
If you have any more instances, please do send them so I can do more investigation before re-running to test the nature of the issue.
In the meantime, to address to observation that this is happening for PR builds and not for push builds (on develop
), I'm not aware of how any differences in the ways each type of build are processed would lead PR builds to "fail" like this. But I will re-consider that when I get the chance to trace through the code with new data.
I am seeing this issue in all of my repos that are using coveralls.. All of them are open source projects so you can easily see the issues.
Here is the last one https://coveralls.io/builds/48906664
The decrease in the individual test variants is due to branches added that only get run on certain versions of ActiveRecord but the "merged" coverage is 100%, yet it is reporting 97.35%.
Also the latest the branches in SchemaPlus repos also have the issue ( https://coveralls.io/github/SchemaPlus ) An example https://coveralls.io/builds/48902416 the aggrigate is 91.89% yet the reported is 91.071%
@jrfnl @urkle
While we've not yet identified a fix for this issue, we released a workaround today that may resolve it for you: the Rerun Build Webhook.
Since the nature of the issue appears to be that, for some repos with parallel builds:
A Rerun Build Webhook, similar to the (Close) Parallel Build Webhook, fixes the issue by triggering your build to re-calculate itself.
Call this at the end of your CI config, after calling the (Close) Parallel Build Webhook.
Call it like this:
curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=<YOUR REPO TOKEN>&build_num=<YOUR BUILD NUMBER>'
But substitute your repo_token
, and your build_num
(the same value you used for build_num
in your (Close) Parallel Build Webhook).
Please note a few differences between the Rerun Build Webhook and the (Close) Parallel Build Webhook:
/rerun_build
endpoint will accept a GET
or a POST
, and_repo_token
and the build_num
, and build_num
is a regular URL param and not part of a JSON body called "payload" as required by the (Close) Parallel Build Webhook:_curl -k https://coveralls.io/webhook?repo_token=<YOUR REPO_TOKEN> -d "payload[build_num]=<YOUR BUILD NUMBER>&payload[status]=done"
@afinetooth Hi James, thanks for getting back to us.
I'll implement this tomorrow in a few repos for which I expect a fair number of builds over the next few weeks. I will report back on how well it works.
One question before I implement this - in GH Actions would you recommend implementing this as a separate step
in the same job
which sends the "parallel finished" signal ? Or should this be implemented in a separate job
with a needs
for the job sending the "parallel finished" signal ?
(or should I implement it one way in one repo, the other way in another and see what works best ?)
Looking your instructions over again - I actually have another question:
But substitute your
repo_token
, and yourbuild_num
(the same value you used forbuild_num
in your (Close) Parallel Build Webhook).
The repo_token
is straight forward as that's ${{ secrets.GITHUB_TOKEN }}
in most workflows, but what should be used for the build_num
?
Neither the upload of the test results via PHP Coveralls, nor the finish hook via the Coverallsapp GH Action require for me to set that manually, so I have no clue what to fill in for build_num
.
The workflows in which I encounter the issue typically look like this (shortened to the relevant steps):
coverage:
runs-on: ubuntu-latest
steps:
# <snip>Multiple steps to run the tests....</snip>
- name: Install Coveralls
if: ${{ success() }}
run: composer global require php-coveralls/php-coveralls:"^2.4.2" --no-interaction
- name: Upload coverage results to Coveralls
if: ${{ success() }}
env:
COVERALLS_REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COVERALLS_PARALLEL: true
COVERALLS_FLAG_NAME: php-${{ matrix.php }}
run: php-coveralls -v -x build/logs/clover.xml
coveralls-finish:
needs: coverage
runs-on: ubuntu-latest
steps:
- name: Coveralls Finished
uses: coverallsapp/github-action@master
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
parallel-finished: true
Hi @jrfnl. You're so right. I forgot about the (Close) Parallel Build Webhook being a special feature in some integrations, not requiring you to build up that request.
So what we're looking for, for build_num
is the same as the value being passed for the service_job_id parameter, as described here our the API Reference.
That value will be different for different CI service / integration combinations, but for the Coveralls Github Action, the correct environment variable will be ${{ env.GITHUB_RUN_ID }}
. Here's where I got that.
You're right about repo_token
, it's ${{ secrets.GITHUB_TOKEN }}
. That's specific to Github Actions only. With other CI services, it's $COVERALLS_REPO_TOKEN
.
One question before I implement this - in GH Actions would you recommend implementing this as a separate step in the same job which sends the "parallel finished" signal ? Or should this be implemented in a separate job with a needs for the job sending the "parallel finished" signal ? (or should I implement it one way in one repo, the other way in another and see what works best ?)
That's actually a great question. I have only tested the endpoint itself, not particular CI configs, so you'll be one of the first.
You may be doing us all a favor by testing both ways, but my recommendation is to start with a new step
in the job
that also calls the (Close) Parallel Build Webhook. As long as it's called after that first webhook call, it should enqueue after it and run after it.
At this point, I don't expect race conditions (where rerun build completes before close build (and calculate coverage)), because I believe those jobs run in transactions, but I suppose it's a possibility if the former fails and has to retry.
NOTE: If you're using a different Coveralls integration and/or are still having trouble determining the correct values for either build_num
or repo_token
let me know here, or in the context of your issue, or at support@coveralls.io.
Thanks for gettting back to me @afinetooth !
That value will be different for different CI service / integration combinations, but for the Coveralls Github Action, the correct environment variable will be ${{ env.GITHUB_RUN_ID }}.
Thanks for that!
I've also confirmed that PHP Coveralls uses the same variable when submitting the test run results: https://github.com/php-coveralls/php-coveralls/blob/master/src/Bundle/CoverallsBundle/Collector/CiEnvVarsCollector.php#L145
You may be doing us all a favor by testing both ways, but my recommendation is to start with a new step in the job that also calls the (Close) Parallel Build Webhook. As long as it's called after that first webhook call, it should enqueue after it and run after it.
I've set up PRs for a couple of repos to try this out:
finish
job to have succeeded.finish
jobIn all cases, the "Rerun" step looks to succeed, but actually fails.
I've tried with both $GITHUB_RUN_ID
(as within workflows, the env variables can be accessed directly), as well as with ${{ env.GITHUB_RUN_ID }}
.
In the first case - $GITHUB_RUN_ID
-, it results in the below - note that the build number doesn't expand:
Run curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num=$GITHUB_RUN_ID'
curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num=$GITHUB_RUN_ID'
shell: /usr/bin/bash -e {0}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 31 0 31 0 0 116 0 --:--:-- --:--:-- --:--:-- 116
{"error":"Invalid repo token."}
Src: https://github.com/PHPCSStandards/PHPCSExtra/actions/runs/3090849883/jobs/5000239773#step:2:11
In the second - ${{ env.GITHUB_RUN_ID }}
-, it fails with this - note that the variable does expand, but doesn't get added:
Run curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num='
curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num='
shell: /usr/bin/bash -e {0}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 31 0 31 0 0 49 0 --:--:-- --:--:-- --:--:-- 49
{"error":"Invalid repo token."}
Src: https://github.com/PHPCSStandards/PHPCSExtra/actions/runs/3090983008/jobs/5000542473#step:2:10
Next, I tried using ${{ github.run_id }}
, which does expand and gets filled in, but still no luck:
Run curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num=3091059721'
curl --location --request GET 'https://coveralls.io/rerun_build?repo_token=***&build_num=3091059721'
shell: /usr/bin/bash -e {0}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 31 0 31 0 0 103 0 --:--:-- --:--:-- --:--:-- 103
{"error":"Invalid repo token."}
Src: https://github.com/PHPCSStandards/PHPCSExtra/actions/runs/3091059721/jobs/5000714503#step:2:10
All PR builds for the above mentioned PRs show a difference in code coverage, while each PR only touches the GH Actions workflow, not the code, so the difference should be (and in reality is) 0%.
I presume the rerun fails due to the "Invalid repo token", even though each of them send the ${{ secrets.GITHUB_TOKEN }}
token as instructed.
Personally, I also find the silent failing of the step, even though there is an error, problematic.
Hi @jrfnl thanks so much for running those tests.
I think I know what the issue may be and will try to fix it and re-deploy today. I will let you know here.
In the meantime, I'm not sure I agree about the "silent failing" of the job for two reasons:
I will definitely consider trying to change that, but for now I would prefer to address it two (2) ways: (1) first, fix the bug, so that, after completing the config once, users have a reasonably good expectation that the same config will always work; and (2) We'll be adding a Rerun Build button to the build page UI, so this can be a backup measure if, as a user, you go to the build page and suspect that the coverage is wrong / that the rerun failed.
Hopefully this can suffice for the time being.
Hi, shouldn't repo_token
be the value which is displayed on coveralls repository homepage (when signed in), instead of GITHUB_TOKEN
?
Using the project token, the rerun payload gets accepted. {"rerun":true,"url":"https://coveralls.io/builds/...","jobs":14}
.
(However the coverage is still not calculated correctly, the value in the header is still inconsistent and much lower then coverage displayed below the "Source files" header).
@OndraM Good question, though in that case, I wonder if that should be changed in more places (and in the documentation in various places too...)
See this workflow example: https://github.com/lemurheavy/coveralls-public/issues/1631#issuecomment-1251649121
Hi, shouldn't
repo_token
be the value which is displayed on coveralls repository homepage (when signed in), instead ofGITHUB_TOKEN
?Using the project token, the rerun payload gets accepted.
{"rerun":true,"url":"https://coveralls.io/builds/...","jobs":14}
.
But then how is Github Action supposed to find that repo-token?
But then how is Github Action supposed to find that repo-token?
In that case, each user should add the repo token from Coveralls to their repo/organisation "secrets" (Settings -> Secrets -> Actions) and then refer to the secret in the GH actions workflow using something like ${{ secrets.COVERALLS_REPO_TOKEN }}
.
In that case, each user should add the repo token from Coveralls to their repo/organisation "secrets" (Settings -> Secrets -> Actions) and then refer to the secret in the GH actions workflow using something like
${{ secrets.COVERALLS_REPO_TOKEN }}
.
There is already a link between GitHub Actions and Coveralls. it is much easier for them to set the path to use ${{ secrets.GITHUB_TOKEN}}
as documented for a patch that will be for a short while, instead of forcing new secrets for everyone.
Agreed, I was just answering your question on the "how", not advocating for making this change, which would mean a lot of work for every single user of Coveralls.
I think I know what the issue may be and will try to fix it and re-deploy today. I will let you know here.
@afinetooth Any news ?
@afinetooth, any updates on this issue? This issue makes Coveralls pretty misleading in our teams PR's
@OndraM @jrfnl @ylavoie @guy-yogev-lmnd Sorry for the delay in getting back to this issue. Thanks for your patience.
I am checking each referenced build and will reply in kind momentarily.
@OndraM,
Hi, shouldn't
repo_token
be the value which is displayed on coveralls repository homepage (when signed in), instead ofGITHUB_TOKEN
?
Yes. That's correct. Sorry if I mislead your or @jrfnl on that point.
(However the coverage is still not calculated correctly, the value in the header is still inconsistent and much lower then coverage displayed below the "Source files" header).
Can you please share the Coveralls URL for the Pull Request build that's showing the error.
One important point is that the aggregate coverage (in the header) will not always match the source file coverage at the top of the source files tree.
That's because the SOURCE FILES table displays line coverage, and the header may be displaying both line coverage and branch coverage.
branch coverage is included/excluded from the aggregate coverage calculation with this repo setting. (By default, branch coverage is ENABLED, so if it's present in your LCOV report, it will be applied in the calculation):
Here is the aggregate coverage calculation formula when branch coverage is included:
aggregate coverage w/ branches = (lines hit + branches hit) / (relevant lines + relevant branches)
Without branch coverage, it's just:
aggregate coverage = lines hit / relevant lines
Another thing to consider is that, even if the aggregate coverage is wrong, it should be accurate per the data in the RUN DETAILS section.
If you share your build URL, we can look at those details together.
Thanks.
@jrfnl:
But then how is Github Action supposed to find that repo-token?
In that case, each user should add the repo token from Coveralls to their repo/organisation "secrets" (Settings -> Secrets -> Actions) and then refer to the secret in the GH actions workflow using something like ${{ secrets.COVERALLS_REPO_TOKEN }}.
Exactly, thank you.
As a workaround, it's not the most ergonomic solution. it's much simpler in other CI contexts, due to the fact that Github Actions use the ${{ secrest.GITHUB_TOKEN }}
to identify the repo at Coveralls.
@ylavoie:
In that case, each user should add the repo token from Coveralls to their repo/organisation "secrets" (Settings -> Secrets -> Actions) and then refer to the secret in the GH actions workflow using something like
${{ secrets.COVERALLS_REPO_TOKEN }}
.There is already a link between GitHub Actions and Coveralls. it is much easier for them to set the path to use
${{ secrets.GITHUB_TOKEN}}
as documented for a patch that will be for a short while, instead of forcing new secrets for everyone.
This is true. As I say above, as this is a workaround, it's not a very ergonomic solution. It's much more straightforward to implement in the context of another CI, like CircleCI, or Travis, just because only the COVERALLS_REPO_TOKEN
is used to identify the repo through the Coveralls API.
@jrfnl:
I think I know what the issue may be and will try to fix it and re-deploy today. I will let you know here.
@afinetooth Any news ?
I'm sorry, I failed to report here that I deployed the fix I had in mind.
in that case, then I'm afraid it did not work for you, at least in the build you tested. I would like to look at that one again, if you wouldn't mind sharing the Coveralls URL for it.
@guy-yogev-lmnd:
@afinetooth, any updates on this issue? This issue makes Coveralls pretty misleading in our teams PR's
I understand completely.
Can you please share the Coveralls URL(s) for the latest build(s) this workaround did not work for.
The verbose build log for the same would be great, but not necessary.
Anecdotally, this workaround is not working for about 2/10 users. While the issue itself is being worked on, I'd like to make sure that the workaround, at least, is 100%.
@jrfnl as I missed the opportunity before, I will use your last PRs mentioned above as cases to look into right now:
About which, you explained:
[...] each PR only touches the GH Actions workflow, not the code, so the difference should be (and in reality is) 0%.
Makes sense.
I will show them side-by-side with their base builds (LEFT), since you indicated the pull_request build (RIGHT) should not have changed in coverage.
Since aggregate coverage should match the underlying data in the RUN DETAILS section of each build, I'll highlight the differences there in each case.
(As I said earlier in this thread, the aggregate coverage in the header of the build page should reflect the data in the RUN DETAILS section, per the formula for coverage. Whereas it won't always match the coverage in the source tree, based on whether or not branch coverage is being included in aggregate coverage.):
Before:
In this case, I suspect the /rerun_build
didn't work, but would have fixed the coverage if it did, since:
Result of (manually) re-running the build:
After: COVERAGE CORRECTED ITSELF:
Same situation as the previous example.
After (manually rerunning build): COVERAGE CORRECTED ITSELF:
Again, after forcibly re-running the build: COVERAGE CORRECTED ITSELF:
The issue that concerns me right now is that the workaround not working for you.
Again, my context is the workaround, since I'm not currently working on the fix for the root cause. (Someone else is.)
I will try to understand and fix why the /rerun_build
webhook is not triggering an actual re-build / re-calculation for you.
@afinetooth
Hi, shouldn't repo_token be the value which is displayed on coveralls repository homepage (when signed in), instead of GITHUB_TOKEN?
Yes. That's correct. Sorry if I mislead your or @jrfnl on that point.
This seems to be in direct contradiction to what you said about this earlier in this thread when I asked the same question:
You're right about repo_token, it's
${{ secrets.GITHUB_TOKEN }}
. That's specific to Github Actions only. With other CI services, it's$COVERALLS_REPO_TOKEN
.
in that case, then I'm afraid it did not work for you, at least in the build you tested. I would like to look at that one again, if you wouldn't mind sharing the Coveralls URL for it.
Open PR with the work-around: https://github.com/PHPCSStandards/PHPCSUtils/pull/342
@jrfnl
Hi, shouldn't repo_token be the value which is displayed on coveralls repository homepage (when signed in), instead of GITHUB_TOKEN?
Yes. That's correct. Sorry if I mislead your or @jrfnl on that point.
This seems to be in direct contradiction to what you said about this earlier in this thread when I asked the same question:
You're right about repo_token, it's
${{ secrets.GITHUB_TOKEN }}
. That's specific to Github Actions only. With other CI services, it's$COVERALLS_REPO_TOKEN
.
You're right. I can't think of a reason I would have said that unless I thought you meant that's the right token to use for the Github Action, rather than for the Rerun Build Webhook.
I am truly sorry about that.
Work-around PR
Thank you. Digging in now...
@jrfnl
Let me address these example builds, first:
Work-around PR
in that case, then I'm afraid it did not work for you, at least in the build you tested. I would like to look at that one again, if you wouldn't mind sharing the Coveralls URL for it.
Open PR with the work-around: https://github.com/PHPCSStandards/PHPCSUtils/pull/342
- Transcript of the original build: https://github.com/PHPCSStandards/PHPCSUtils/actions/runs/3091134636/jobs/5000983697
- Original Coveralls build for that PR: https://coveralls.io/builds/52618431
- Transcript of a new build after rebasing: https://github.com/PHPCSStandards/PHPCSUtils/actions/runs/3389808333/jobs/5633345080 (still says invalid token):
- Coveralls build after rebasing: https://coveralls.io/builds/53897956 <= The numbers seem to be correct, even though the re-run didn't work.
So, the first example build: https://coveralls.io/builds/52618431
Is one of the ones I manually re-ran. So it now matches the coverage of its base build, as expected: https://coveralls.io/builds/52194076 (98.139%)
But it's no surprise to me that it didn't fix itself after the original CI run, because I see that the Rerun Build call failed with Invalid repo token
(as you later point out).
In any case, so much for that one.
As for the second build you mention: https://coveralls.io/builds/53897956
That is not one of the builds I manually re-ran earlier today.
But the coverage does match that of its base build: https://coveralls.io/builds/53731376 (99.524%)
Since we know from the transcript you provided for that build, that the Rerun Build Webhook also failed there, with Invalid repo token
, we know that this build wasn't run a second time, indicating it wasn't affected by the original issue.
As you said:
<= The numbers seem to be correct, even though the re-run didn't work.
That's good news.
It would be even better news if that behavior was consistent now.
We have been tackling a lot of issues lately, so I suppose it's possible the key factor in this issue's root cause was fixed.
@jrfnl
Notes:
- This repo (and the others I've mentioned) does not send branch coverage, only line coverage, so there should be no difference between the numbers.
Right. I saw that. Did it once include branch coverage?
If so, I wonder if that indicates branch coverage as a factor in this issue's root cause. I will def share that internally. Thanks for pointing it out.
- I have noticed that the numbers do seem to be less erratic over the last week or so, but they are often still off.
That's good. As I say, please let me know if it's consistent or not. 🤞
- I've also noticed, mostly this week and in another repo, though the repos have the same GH actions setup, that Coveralls seems to be not recording some of the coverage (even though it has been submitted). Not sure if it is related, but I figure it might be [...]
Ok, let me take that one separately...
I'm not sure what's going on there, but what I noticed right away is that those files are duplicates:
I assume that's not how it looks in your Github repo?
@afinetooth Thanks for looking into this further.
But it's no surprise to me that it didn't fix itself after the original CI run, because I see that the Rerun Build call failed with Invalid repo token (as you later point out).
I'll try and find the time over the next week to figure out setting the repo token via secrets and will update the PR to use the repo token. If nothing else, at least we can then see if it makes a difference for the re-run.
We have been tackling a lot of issues lately, so I suppose it's possible the key factor in this issue's root cause was fixed.
❤️ That would be great news, but let's wait and see for a bit to see if the results hold.
This repo (and the others I've mentioned) does not send branch coverage, only line coverage, so there should be no difference between the numbers.
Right. I saw that. Did it once include branch coverage?
If so, I wonder if that indicates branch coverage as a factor in this issue's root cause. I will def share that internally. Thanks for pointing it out.
None of the repos I'm involved with have ever send branch coverage to Coveralls.
I do use it locally, but use a separate config file for that as branch coverage is only available in recent versions of the test tooling I use (PHPUnit 9.3+), while the tests in CI are run on a wide range of PHPUnit versions (and older PHPUnit versions would not even run if the config included the settings to enable branch coverage).
I'm not sure what's going on there, but what I noticed right away is that those files are duplicates:
I assume that's not how it looks in your Github repo?
Correct, the repo doesn't contain duplicate files. You can see for yourself: https://github.com/PHPCSStandards/PHPCSExtra/tree/develop/Universal/Sniffs
P.S.: oh and just to be clear - those "blank" results with duplicate files being recorded wasn't a one off, it's been happening regularly:
@jrfnl came across this one from way-back while performing a google search!
Regarding your last message, I had a look at one of the builds and confirmed the duplicate files: https://coveralls.io/builds/53861438
That's an old build, but is this still happening anywhere for you?
I think you may have since migrated from php-coveralls to Coveralls GitHub Action (though not all projects perhaps), so, whether that's the case, or not, I'm pretty sure I understand the root cause for those happening back then, as now, so I can probably help if it's still an issue anywhere.
Please let me know.
@jrfnl came across this one from way-back while performing a google search!
@afinetooth Blast from the past ;-) Though reading back some of the above discussion does go a long way to explain why there was so much discussion and confusion about whether to use the GITHUB_TOKEN
vs the COVERALLS_TOKEN
in #1721... (and why I previously switched to the COVERALLS_TOKEN
).
That's an old build, but is this still happening anywhere for you?
I don't recall seeing these issues anymore in the past six months or so. A few promille off once in a while, but nothing like I was seeing when I opened this issue and no duplicate files either.
I think you may have since migrated from php-coveralls to Coveralls GitHub Action (though not all projects perhaps),
Correct and for all projects, though there may still be some open PRs from that series. The vast majority of PRs has gone through and been merged though.
so, whether that's the case, or not, I'm pretty sure I understand the root cause for those happening back then, as now, so I can probably help if it's still an issue anywhere.
At this time, I think this issue can be closed, at least for those things I reported. I can't speak for others who reported similar issues in this thread.
I'm seeing some very weird and wildly incorrect results coming in on PR builds, which is making Coveralls unreliable and disruptive as it means merges are either blocked because the Coveralls build is unjustly failing/merging with confidence is no longer possible.
To illustrate:
Last build against the default
develop
branch for a repo:Link: https://coveralls.io/builds/48211375
Build for a PR against
develop
which is intended to increase code coverage:Link: https://coveralls.io/builds/48212375
Report in the PR:
Link: https://github.com/PHPCSStandards/PHPCSUtils/pull/308
What's wrong
As you can see from the above screenshots, the PR in actual fact makes the code coverage go UP (from 96.71% to 97.3%). The number at the top of the PR build is incorrect (88.95%) , but that is the number which is being reported to GitHub causing the build to fail.
Not sure what's going on - I haven't changed any repo settings on Coveralls recently or anything.
Help ?