tweag / nixpkgs-graph-explorer

Explore the nixpkgs dependency graph
MIT License
15 stars 0 forks source link

Logic for pname conflicts in nixpkgs #84

Closed dorranh closed 1 year ago

dorranh commented 1 year ago

While browsing the graph we generate for nixpkgs today, I noticed some confusing behavior where a package, numpydoc, had a direct dependency on emacs. I noticed this for some other packages such as python as well, and after further investigation it looks like this is a case where multiple packages in nixpkgs define the same pname. For example, here are a few numpydocs packages extracted from nixpkgs:

[
    Package(name='emacs-numpydoc-20230320.1439', output_paths=[OutputPath(name='out', path='/nix/store/xknpc90dzq1mv3csh836fafrj8bnj6a8-emacs-numpydoc-20230320.1439')], nixpkgs_metadata=NixpkgsMetadata(pname='numpydoc', version='20230320.1439', broken=False, license=''), build_inputs=[BuildInput(build_input_type='build_input', output_path='/nix/store/0zmcjkjfs5hlx5g57m8gp6x8vg99bvyx-emacs-28.2'), BuildInput(build_input_type='build_input', output_path='/nix/store/84cz7vyj2r4v6hbhzybm6d0jqzmg9nwb-texinfo-7.0.3'), BuildInput(build_input_type='build_input', output_path='/nix/store/9xdvyn3q8jhvcnnwwlsdblp0shxlnv98-emacs-dash-20230415.2324'), BuildInput(build_input_type='build_input', output_path='/nix/store/79xn55wsadvzczk2dvxjczmygifxl711-emacs-s-20220902.1511'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/9xdvyn3q8jhvcnnwwlsdblp0shxlnv98-emacs-dash-20230415.2324'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/79xn55wsadvzczk2dvxjczmygifxl711-emacs-s-20220902.1511')]),

    Package(name='python3.10-numpydoc-1.5.0', output_paths=[OutputPath(name='out', path='/nix/store/ka5h40r9gn59cvclv5rmwkysmcb2fv3d-python3.10-numpydoc-1.5.0'), OutputPath(name='dist', path='/nix/store/2garb80y8h8wbl6w252ppx270lrzcwm5-python3.10-numpydoc-1.5.0-dist')], nixpkgs_metadata=NixpkgsMetadata(pname='numpydoc', version='1.5.0', broken=False, license='Unspecified free software license'), build_inputs=[BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/rhbswyz3qw0zsz7ha337hxxmjk5zf8v9-python3.10-Jinja2-3.1.2'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/c3qafrsdcwmbiybpcq9rljvn3cz6r92s-python3.10-sphinx-5.3.0'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/6jx4nc20v3cfphlbypqid30q41i5vh5x-python3-3.10.11')]),

    Package(name='python3.11-numpydoc-1.5.0', output_paths=[OutputPath(name='out', path='/nix/store/2dsqmjamqq205zafc3x2y1akjrb15gj0-python3.11-numpydoc-1.5.0'), OutputPath(name='dist', path='/nix/store/y4l71lxgdghs1bihjqafvaqhad4awy0v-python3.11-numpydoc-1.5.0-dist')], nixpkgs_metadata=NixpkgsMetadata(pname='numpydoc', version='1.5.0', broken=False, license='Unspecified free software license'), build_inputs=[BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/1xa6wzi3c8vpff5r4smiidiw2s1w8aqx-python3.11-Jinja2-3.1.2'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/gfy77ac8ad0gcsy1padxm0vdy9nsmm9x-python3.11-sphinx-5.3.0'), BuildInput(build_input_type='propagated_build_input', output_path='/nix/store/2nzwxmvb9n18iahgszzc3lfafg3rjzmz-python3-3.11.3')])
]

Since we currently only ingest the outputPath and pname into Gremlin, this leads to confusing behavior such as the case described above.

Is this intended in the design, or should we perhaps rely more heavily on the name attribute instead to avoid such conflicts?

GuillaumeDesforges commented 1 year ago

If I understand correctly, in the example, package with name=emacs-numpydoc-20230320.1439 has a pname=numpydocs.

Note this is to be expected for any build*Package (e.g. buildPythonPackage).

Now I guess your point is that the UI displays a "numpydoc" node, but it should really be "emacs-numpydoc".

To fix this confusion I propose to

Additionally, we should consider a UX to communicate node attributes to the user. I can think of two ways:

zz1874 commented 1 year ago

I agree with @GuillaumeDesforges's suggestion. After examining the JSON file we received, I confirmed that the parsedName field is emacs-numpydoc instead of numpydoc for the first package. Therefore, it seems reasonable to use parsedName instead.

We can easily make this change by replacing the name field in Nixpkgs_metadata with parsedName while keeping the pname outside of Nixpkgs_metadata. This approach ensures that the data model accurately captures the intended name.

dorranh commented 1 year ago

Thanks for the input @GuillaumeDesforges and @zz1874. That also makes sense to me as long as there aren't any key use cases for pname we are going to miss out on by removing it from the schema.

GuillaumeDesforges commented 1 year ago

@dorranh I don't think we need to remove it from the schema if the graph database can handle node attributes. Isn't pname a node attribute already anyway? I hope it's not the node id 😄

dorranh commented 1 year ago

@GuillaumeDesforges, that is also possible. I might have misunderstood the suggestion above. Was you idea to add an additional attribute (e.g. name or parsedName)?

By the way, not to worry - the node id is the output path: https://github.com/tweag/nixpkgs-graph-explorer/blob/05f831f12c171c42de818a6479a35acfc239a6d5/api/explorer/api/graph.py#L96-L98