Open root-hbx opened 3 weeks ago
In fact, employing the --all
option to check all files introduces certain drawbacks, as it increases the workload significantly, resulting in longer inspection times.
Consequently, we must strike a balance between "comprehensive checks" and "manageable workload." This is why I did not directly modify .github/workflows/pylint.yml
in the code of this PR, but instead made a brief mention of it in the aforementioned discussion.
Looking forward to any insights and guidance provided :)
The motivation for this PR arose tonight when I ran bash format.sh
as usual and encountered errors reported by pylint:
Success: no issues found in 231 source files
Sky Pylint:
************* Module setup
sky/setup_files/setup.py:161:0: C0301: Line too long (88/80) (line-too-long)
sky/setup_files/setup.py:176:0: C0301: Line too long (100/80) (line-too-long)
sky/setup_files/setup.py:177:0: C0301: Line too long (96/80) (line-too-long)
sky/setup_files/setup.py:180:0: C0301: Line too long (112/80) (line-too-long)
sky/setup_files/setup.py:181:0: C0301: Line too long (113/80) (line-too-long)
sky/setup_files/setup.py:183:0: C0301: Line too long (112/80) (line-too-long)
sky/setup_files/setup.py:184:0: C0301: Line too long (113/80) (line-too-long)
sky/setup_files/setup.py:189:0: C0301: Line too long (86/80) (line-too-long)
sky/setup_files/setup.py:196:0: C0301: Line too long (107/80) (line-too-long)
sky/setup_files/setup.py:225:0: C0301: Line too long (119/80) (line-too-long)
sky/setup_files/setup.py:248:0: C0301: Line too long (109/80) (line-too-long)
sky/setup_files/setup.py:180:0: C4001: Invalid string quote ", should be ' (invalid-string-quote)
sky/setup_files/setup.py:181:0: C4001: Invalid string quote ", should be ' (invalid-string-quote)
sky/setup_files/setup.py:183:0: C4001: Invalid string quote ", should be ' (invalid-string-quote)
sky/setup_files/setup.py:184:0: C4001: Invalid string quote ", should be ' (invalid-string-quote)
These errors are not related to the code I modified, prompting me to examine the logic in format.sh
.
I discovered that under the initial setting of pylint --load-plugins pylint_quotes sky
, these issues were not detected. However, following the implementation of #4184, they became identifiable.
Upon further investigation, I found that pylint seems not to recurse into directories that do not have an __init__.py
file (i.e., namespace packages) and are not included in sys.path
or PYTHONPATH
.
Thus, neither the original code nor the --all
option in #4184 can check all files, which has been validated.
Therefore, we need to provide users with three distinct checking formats:
--all
in this PR.--legacy
, ensuring backward compatibility.These are my preliminary thoughts, and I would truly appreciate your guidance.
Instead of adding another option, how about we just fix those lint issues when sky/
is used, and make the CI use sky/
as well?
Yes, I think ideally we can use your ls-files
approach for --all
, and hopefully we don't need to use --legacy
at all.
It does require fixing the existing lint issues that were not being caught before.
@root-hbx What is the performance of --all
vs --legacy
with your changes? You mentioned that it is slower, but by how much?
There are a couple of other issues with format.sh
that I'm now noticing.
origin/master
, then changed_files
will be empty. But we probably just want to default to the --all
behavior in this case.MERGEBASE
is defined inside format_changed
, but we assume it is defined outside of that function. We should clean this up somehow.@root-hbx If you're interested in working on these, happy to support these fixes as well.
Thank you for your suggestions. @Michaelvll
It appears that addressing lint issues solely when sky/
is used and integrating this into the CI might be challenging due to certain limitations within pylint
itself, which may not align closely with our primary objectives at the moment. Maybe we could wait for pylint
to resolve this issue and then make further optimizations. What do you think?
Additionally, another consideration is that running full file checks in the CI could potentially lead to some confusion for current developers. For example, although their code may previously have received a perfect score of 10/10 in local tests, they could now encounter new issues under the revised CI checks, as highlighted in my previous points.
Thanks for the fix :) @cg505
Current reason for using --all
being slower than --legacy
is that the former inspects a broader range of files that the latter does not cover. For instance, regarding my current commit (be0e1a0), the data shows only a minimal performance discrepancy between the two approaches, which is listed as follows:
I believe your perspective aligns perfectly with @Michaelvll, and I think this is a super fancy idea. The optimal approach might be a manual revision of all files flagged by pylint
under --all
. Subsequently, we could adopt ls-files
approach for --all
, making --legacy
obsolete.
However, a key drawback is that numerous files, previously unflagged by pylint
, would require extensive manual updates. Given the substantial labor involved, do we need to proceed in this manner?
There are a couple of other issues with
format.sh
that I'm now noticing.
- If we are up-to-date with
origin/master
, thenchanged_files
will be empty. But we probably just want to default to the--all
behavior in this case.MERGEBASE
is defined insideformat_changed
, but we assume it is defined outside of that function. We should clean this up somehow.@root-hbx If you're interested in working on these, happy to support these fixes as well.
LGTM! If you can resolve these issues, it would make this check much more robust (especially the first point you mentioned). Looking forward to your fix! If there’s anything I can help with, feel free to reach out :D
@cg505 After my testing and verification, yapf
does not raise any issues. Both --all
and --legacy
work well in current format tests(yapf
/ mypy
/ pylint
). PTAL!
Thank you for your suggestions. @Michaelvll
It appears that addressing lint issues solely when
sky/
is used and integrating this into the CI might be challenging due to certain limitations withinpylint
itself, which may not align closely with our primary objectives at the moment. Maybe we could wait forpylint
to resolve this issue and then make further optimizations. What do you think?Additionally, another consideration is that running full file checks in the CI could potentially lead to some confusion for current developers. For example, although their code may previously have received a perfect score of 10/10 in local tests, they could now encounter new issues under the revised CI checks, as highlighted in my previous points.
I am confused. What issue are you referring to for pylint? Why can't we just fix those new errors showing by the new way we call pylint? I don't understand why we have to add a new argument --legacy
.
Thank you for your suggestions. @Michaelvll It appears that addressing lint issues solely when
sky/
is used and integrating this into the CI might be challenging due to certain limitations withinpylint
itself, which may not align closely with our primary objectives at the moment. Maybe we could wait forpylint
to resolve this issue and then make further optimizations. What do you think? Additionally, another consideration is that running full file checks in the CI could potentially lead to some confusion for current developers. For example, although their code may previously have received a perfect score of 10/10 in local tests, they could now encounter new issues under the revised CI checks, as highlighted in my previous points.I am confused. What issue are you referring to for pylint? Why can't we just fix those new errors showing by the new way we call pylint? I don't understand why we have to add a new argument
--legacy
.
In the original check, we used pylint --load-plugins pylint_quotes sky/
with the --all
option. However, this command only checks subfolders containing an __init__.py
file, rather than examining all files within sky/
. This limitation is due to pylint's fault, as documented here.
In this pull request, we address this by running pylint --load-plugins pylint_quotes $(git ls-files 'sky/*.py' 'sky/*.pyi')
when the --all
option is selected. This approach ensures that all files under sky/
are checked. Additionally, we provide a --legacy
option, which has the same behavior like previous --all
option.
BTW, we can also change the commands in CI from pylint --load-plugins pylint_quotes sky/
into pylint --load-plugins pylint_quotes $(git ls-files 'sky/*.py' 'sky/*.pyi')
, which will truly check every files under sky/
.
As your previous suggestion, we can remove the --legacy
option and retain only the enhanced --all
functionality if you agree with this approach :)
Yes, I think this works @cg505. Once all pylint
errors are resolved, we can merge everything together (--legacy
option will be removed then).
This way, we address all historical issues within this PR and plan to introduce the new features you mentioned in a subsequent PR.
Here, we introduce an enhancement for the --all
option:
With this update, the --all
option allows users to specify file or directory paths for format.sh
to ignore manually.
--all
option, we can use EXCLUDE_PATHS
in format.sh
to specify files or directories to exclude.--legacy
option, exclusions will continue to be handled by ignore=
in .pylintrc
, the same as previous --all
option.Why is this enhancement necessary?
As previously highlighted by @cg505, a huge number of historical pylint issues remain unresolved, which are not pertinent to our primary objectives (last time they were modified was a looooooong time ago). I think manually addressing these nits would be time-consuming and offer little meaningful benefit.
This functionality will enable us to bypass these legacy issues temporarily, focusing the --all
option on enforcing code quality in the current and upcoming development.
Simply put:
Let the past be the past; we’ll use
--all
to enforce code standards moving forward.
Do you think this way of "bypassing the existing excess issues" could work? I would greatly appreciate any suggestions or guidance you could offer. @Michaelvll @cg505
This PR aims to enhance the functionality of
format.sh
to provide more comprehensive file checks.The origin of this improvement is as follows:
1) The initial functionality (which included machine checks in workflows) was:
With
sky
as the only parameter, according to pylint’s check guidelines, it only checked folders withinsky/
that contained an__init__.py
file, along with their internal files.2) In PR4184, @cg505 proposed the idea of “limiting pylint to only files that have changed from master,” which led to the current version of
format.sh
as shown below:3) Based on the above, I believe the
--all
option should check all subdirectories and internal files withinsky/
. However, due to pylint’s limitations, it can only check subdirectories that contain an__init__.py
file, which does not meet our requirements. Therefore, I implemented the following functionality in my PR:We use
--legacy
to signify the original checking method, and we introduce--all
to check all files (usinggit ls-files 'sky/*.py' 'sky/*.pyi'
as the target files, avoiding pylint’s inherent limitations).TL;DR After this change, we can use
format.sh
like:bash format --all
: check all the files insky/
bash format --legacy
: perform checks as originally designedbash format
: check only files that have changed from masterFurthermore, if this PR is deemed appropriate, we may need to consider updating
.github/workflows/pylint.yml
to align with the enhanced functionality informat.sh
.Currently, it uses
pylint --load-plugins pylint_quotes sky
. If we wanna check all, we need to replace it withpylint --load-plugins pylint_quotes $(git ls-files 'sky/*.py' 'sky/*.pyi')
.Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh