Closed StrikerRUS closed 2 weeks ago
I'm investigating this.
Since this is not happening on PRs but is happening reliably on merges to master
, I'm going to try to investigate by trying to answer the question "what is different about Windows bdist
jobs triggered by merges?".
One idea I'm considering is that maybe some filepath is slightly different. https://github.com/sarugaku/shellingham/issues/8 suggests that maybe this error can show up when a too-long filepath is passed through to something using the Windows API to interact with the filesystem.
I compared the logs between a recent successful PR build (build link) and master
build (build link).
The most significant difference I see is that on merges to master
, these CodeQL scanning tasks are being injected: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=16693&view=logs&j=ea56812e-e7ae-55d0-6abc-4a217857fa9f&t=39805323-ea77-5bd7-9d32-4263a4c166c3.
From the logs there, it looks like those are setting up some kind of tracing.
I wonder if maybe the mechanism there is somehow interfering with system tools in a way that pydistcheck
cannot handle? The logs sort of make it look like core parts of the Windows API are being replaced with instrumented versions (but I'm very unsure about this).
I'm going to put up a PR trying to turn those jobs off.
I think that #6563 fixed this.
https://github.com/microsoft/LightGBM/pull/6563#issuecomment-2241928232
But I think we should leave it open until we see a few more successful CI runs on merges. I'm putting the awaiting response
label on it so it'll be automatically closed after 30 days (just in case we forget to come back and close it).
@jameslamb Ah, thank you very much for finding this hidden rogue! I remember we switched of this check in the past: #5175.
I have no idea what is it, but we haven't requested these checks. 😕 They just increase our overall CI time and are potential places for project-unrelated CI failures.
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!
Seems that Azure Pipelines builds are constantly failing for the
master
branch.For example, the latest one link: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=16597&view=results
One of the failing jobs is Windows bdist