sourcegraph / scip-python

SCIP indexer for Python
Other
46 stars 21 forks source link

Baseline snapshot test failures on macOS #91

Open varungandhi-src opened 10 months ago

varungandhi-src commented 10 months ago

When running npm run check-snapshots on macOS off 0834d92bcb4f6e1b64f8dd6a580b5068262d8bc4, I get the following failure:

Index: /Users/varun/Code/scip-python/packages/pyright-scip/snapshots/output/nested_items/src/importer.py
===================================================================
--- /Users/varun/Code/scip-python/packages/pyright-scip/snapshots/output/nested_items/src/importer.py        (what the snapshot tests expect)
+++ /Users/varun/Code/scip-python/packages/pyright-scip/snapshots/output/nested_items/src/importer.py        (what the current code produces). Run the command "npm run update-snapshots" to accept the new behavior.
@@ -2,13 +2,13 @@
 #documentation (module) src.importer

 from foo.bar import InitClass
 #    ^^^^^^^ reference  snapshot-util 0.1 `foo.bar`/__init__:
-#                   ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#
+#                   ^^^^^^^^^ reference  snapshot-util 0.1 /InitClass#
 from foo.bar.baz.mod import SuchNestedMuchWow, AnotherNestedMuchWow
 #    ^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 `foo.bar.baz.mod`/__init__:
-#                           ^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/SuchNestedMuchWow#
-#                                              ^^^^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/AnotherNestedMuchWow#
+#                           ^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 /SuchNestedMuchWow#
+#                                              ^^^^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 /AnotherNestedMuchWow#

 print(SuchNestedMuchWow().class_item)
 #^^^^ reference  python-stdlib 3.11 builtins/print().
 #external documentation ```python
@@ -19,15 +19,15 @@
 #            >     file: SupportsWrite[str] | None = No...
 #            >     flush: Literal[False] = False
 #            > ) -> None
 #            > ```
-#     ^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/SuchNestedMuchWow#
-#                         ^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/SuchNestedMuchWow#class_item.
+#     ^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 /SuchNestedMuchWow#
+#                         ^^^^^^^^^^ reference  snapshot-util 0.1 /SuchNestedMuchWow#class_item.
 print(AnotherNestedMuchWow().other_item)
 #^^^^ reference  python-stdlib 3.11 builtins/print().
-#     ^^^^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/AnotherNestedMuchWow#
-#                            ^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar.baz.mod`/AnotherNestedMuchWow#other_item.
+#     ^^^^^^^^^^^^^^^^^^^^ reference  snapshot-util 0.1 /AnotherNestedMuchWow#
+#                            ^^^^^^^^^^ reference  snapshot-util 0.1 /AnotherNestedMuchWow#other_item.
 print(InitClass().init_item)
 #^^^^ reference  python-stdlib 3.11 builtins/print().
-#     ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#
-#                 ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#init_item.
+#     ^^^^^^^^^ reference  snapshot-util 0.1 /InitClass#
+#                 ^^^^^^^^^ reference  snapshot-util 0.1 /InitClass#init_item.

This is not reproducing on Ubuntu.

Titou325 commented 7 months ago

Hi @varungandhi-src,

We are taking a look at this issue to get scip-python to work for our use case. We have been tinkering with this issue prior to tackling #86.

However, there are some things that are unclear even in the expected test outputs. Most notably, why is reference

reference  snapshot-util 0.1 foo.bar`/__init__:`

Rather than

reference  snapshot-util 0.1 src.foo.bar`/__init__:`

at line https://github.com/sourcegraph/scip-python/blob/scip/packages/pyright-scip/snapshots/output/nested_items/src/importer.py#L5C13-L5C13 in the expected output?

I would expect a full identifier given that we resolve in src, having itself a __init__.py file.


Reproducing the macOs error

The macOs error is easily reproduced, and we have pushed the output to https://github.com/Titou325/scip-python/blob/fix-91/packages/pyright-scip/snapshots/output/nested_item__macos/src/importer.py

Deep dive

To get the test suite coherent on macOs (but it may be erroneous on other platforms?) we need to switch the importer.py syntax to either qualify as a local import or a fully-qualified import.

Here are the things that we have noted:

Incoherent init.py resolution for subfolders

On macOs and given the current test input snapshots, the output leaves each and every __init.py__ file from the nested_items folder to have the signature

# < definition scip-python python snapshot-util 0.1 /__init__:
#documentation (module) 

regardless of their position in the file tree. The only __init__.py file that resolves itself correctly is src/__init__.py.

Correct resolution if we remove the imports

When we remove importer.py and long_importer.py, the __init__.py files resolve correctly, as well as related functions. However, this is not a valid fix. The indexing of either of the two importers seems to override the correct indexing.

Correct resolution given relative import paths, wrong import

If we replace the imports contained in the importer.py and long_importer.py by relative imports, resolution is fixed and each and every file is seemingly indexed properly (although with preceding src., contrary to what is in the baseline output).

There is still an issue with the imported reference, I.E.

from .foo.bar import InitClass
#    ^^^^^^^^ reference  snapshot-util 0.1 `foo.bar`/__init__:
#                    ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#

We would expect (note the src. for import reference):

from .foo.bar import InitClass
#    ^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/__init__:
#                    ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#

Correct resolution given fully qualified import paths

If we replace the imports contained in the importer.py and long_importer.py by fully-qualified imports, relative to src, the resolution is fixed as well and every file is indexed properly.

The imports are also correctly qualified (NB: however this still doesn't match the expected snapshot output due to src. qualification)

from src.foo.bar import InitClass
#    ^^^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/__init__:
#                       ^^^^^^^^^ reference  snapshot-util 0.1 `src.foo.bar`/InitClass#

Moving forward

With that out of the way, it seems that the relative imports are not properly resolving to the root namespace and would be the cause of this issue? We will continue to troubleshoot this but any input or guidance on what could be going wrong would be much appreciated.

Thanks for the hard work and sorry for such a long read!

Titou325 commented 7 months ago

It seems to affect any relative imports, as also seen in the very simple demo at https://github.com/Titou325/scip-python/blob/fix-91/packages/pyright-scip/snapshots/output/relative_import_nested/src/bar.py

From what I can tell, the ModuleName resolution here relies on the node.nameParts property and the current package. However, pyright only passes moduleParts based on the file import declaration rather than the qualified imported target.

For example, this:

# Current module is src, package is snapshot-utils

from .foo.bar import Something

Pyright will pass nameParts: ["foo", "bar"] although it has resolved to the proper src.foo.bar module. For a relative import, node.importInfo.isRelative is set to true, however this is not the case for "absolute-relative" imports such as from foo.bar import Something.

A possible workaround for relative imports is to fetch back the current module name to resolve the relative import, with something similar to (pseudo-code):

resolveRelative(this.fileInfo!.moduleName as string, node.nameParts as string[]) as string;

resolveRelative("src.importer", ["foo", "bar", "mod"]); // src.foo.bar.mod

Which doesn't seem like the best thing to do. For "absolute-relative" imports, there is no way to detect that the node nameParts need to be requalified other than fetching the module name of the destination file.

I'm keen on hearing your inputs on this and if anything has already been considered regarding this issue.

Have a nice day!

varungandhi-src commented 7 months ago

Thanks for your diligence here, and apologies for not responding sooner.

Most notably, why is reference

reference  snapshot-util 0.1 foo.bar`/__init__:`

[...]

I agree with your assessment, that seems like a bug. I've filed a new issue here: https://github.com/sourcegraph/scip-python/issues/133

With that out of the way, it seems that the relative imports are not properly resolving to the root namespace and would be the cause of this issue?

We will continue to troubleshoot this but any input or guidance on what could be going wrong would be much appreciated.

I looked into this briefly. I suspect it's probably a bug in Pyright itself (or at least, an inconsistency across platforms, not technically a bug if it isn't affecting user-facing behavior in Pyright). I'm assuming Pyright isn't relying on the exact strings that we're relying on in its own cross-platform tests, causing this divergence.

One potential way debug this would be to:

  1. Which field(s) we're relying on to produce the differing output between macOS & Ubuntu
  2. Add a logging statement with __file__ and __line__ information, whenever that field is set inside Pyright (for the version we're using, we haven't been able to update it in a while unfortunately).
  3. Run the modified Pyright on Linux (or Docker inside macOS) and macOS and diff the logs to see where the difference is first coming up.

It seems to affect any relative imports [...] (your second comment)

I think there are at least two different problems here:

  1. Some of the snapshot outputs are just wrong (incorrect symbol names)
  2. There is a difference in behavior between macOS and Ubuntu.

This GitHub issue is about problem 2. I'm not 100% sure, but if I understand correctly, your second comment is describing part of the cause behind problem 1. Is that correct? If so, could you file a new issue for it (if it's not covered by https://github.com/sourcegraph/scip-python/issues/133) or mention it in https://github.com/sourcegraph/scip-python/issues/133?

Let's keep this GitHub issue scoped to cross-platform differences, not to whether the output is correct or not.

Titou325 commented 7 months ago

Hi @varungandhi-src, thanks for circling back on this.

133 indeed covers the seemingly only issue that affects all platforms.

For the scope of #91, the question really boils down to why aren't imported members correctly qualified, if qualified at all, on macOS.

Relative import Absolute-relative import Fully-qualified import
from .a.b import C  from a.b import C  from src.a.b import C
 a.b resolves On all platforms, but with a partially qualified symbol Not on macOS, maybe others? On all platforms, with a fully qualified symbol
C resolves On all platforms, with a fully qualified symbol Not on macOS, with a fully qualified symbol on other platforms On all platforms, with a fully qualified symbol
class C(): ... in the source file resolves On all platforms, with a fully qualified symbol On macOS if not imported from another file, otherwise not. Resolves with a fully qualified symbol on other platforms  On all platforms, with a fully qualified symbol

I think resolution of #133 may collaterally fix #91, but can't really be sure at this time, hence the report on this thread. I will focus on #133 for now.

Titou325 commented 7 months ago

I have spent quite some time this afternoon debugging the existing code, and the root cause seems to be a wrong moduleName property for the module node. This info is used when calculating the module name.

Here is a debugger screenshot, macOS runner on the left, Debian on the right.

image

This comes from pyright-internal. Are there any plans to update the dependency to see if it has been resolved upstream?

Thanks

varungandhi-src commented 7 months ago

This comes from pyright-internal. Are there any plans to update the dependency to see if it has been resolved upstream?

Yep, I got this far too, and I attempted to update the sync the changes from upstream earlier in an attempt to fix that. However, merging the changes didn't turn out to be as simple (I've pushed the status I had, with some unfixed merge conflicts in this PR: https://github.com/sourcegraph/scip-python/pull/137)

Given the challenge there, my first instinct was to attempt to verify if this problem has been fixed upstream or not (and only go through the effort for fixing the merge conflicts if it was fixed). However, there doesn't seem to be any easy CLI flag to extract the AST with module names to verify if this was fixed.

So I was going to add some extra logging to base Pyright to check if there was a difference in behavior between the one we're on vs latest. But I didn't have the time to figure out which parts of the code to instrument before I switched to focus on another project.

If you have the time to look into merging the changes from upstream (regardless of whether that fixes the test issues or not), I'd be happy to review a PR. Even if the latest version of upstream still has this issue, submitting a patch upstream (and switching to that in scip-python) will be easier if we have already synced all the other recent changes.

Titou325 commented 7 months ago

Digged a bit further, it seems to all boil down to path casing.

If the import cannot be found in strict resolution (not relative nor absolute, which is our case where we import without leading dot although it is not the fully qualified path), pyright normalizes the path casing. The import is then resolved, but the import return path is in lowercase. On macOS, the initial path and python lookup path may contain uppercase characters (even in a lower case repo, we often have /Users/ as a prefix). As such, when pyright tries to resolve the import again in subsequent import calls, most notably in getModuleNameInfoFromPath, the execEnv root still contains uppercase characters, while the filePath doesn't. This function fails and returns the empty moduleName.

This seems to have been changed upstream to realCasePath, however I don't seem to find that in your PR? (pylance merge)

I have added a quick (dirty) patch for this, but merging the upstream would be good. I am short on time now, but I will take a look at the merges.

varungandhi-src commented 7 months ago

Thanks for digging! It might just make sense to abandon my PR and attempt to merge the latest changes directly. I'd tried that merge a while back (Aug 22 is the commit date). I just put it up to show that there were some non-trivial merge conflicts due to code changes.