mesonbuild / meson-python

Meson PEP 517 Python build backend
https://mesonbuild.com/meson-python/
MIT License
125 stars 65 forks source link

BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages #562

Closed ogrisel closed 7 months ago

ogrisel commented 8 months ago

This should solve the problem of empty __path__ attributes for packages imported from an editable install.

Fixes https://github.com/mesonbuild/meson-python/issues/557.

ogrisel commented 8 months ago

/cc @lesteve @rgommers @dnicolodi.

rgommers commented 8 months ago

Thanks @ogrisel! That's a nicely concise fix. I'm not too familiar with this code, but the description at https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec.submodule_search_locations seems clear enough about this being correct.

I can confirm that it fixes the issue with the SciPy test - it now correctly checks all submodules rather than the test silently passing without doing anything.

dnicolodi commented 8 months ago

NAK. This does not work. There is a reason why I decided to do not populate the __path__ module attribute.

ogrisel commented 8 months ago

What is the reason? There is no inline comment , nor any failing test when i changed that line of code.

dnicolodi commented 8 months ago

__path__ is defined to be "the list of locations where the package’s submodules will be found", where "locations" is interpreted by most code to be a list of directories. However, in editable installations there is not a correspondence between the filesystem location of sources and their place in the module structure. Not even the file names need to match the module names. Therefore, setting __path__ in the way this patch does violates the definition of what __path__ is. You omitted testing __path__ for the extension module in the test package. If you would have tested it, you would have discovered that it points to a different directory than for the other modules in the same package.

ogrisel commented 8 months ago

You omitted testing path for the extension module in the test package. If you would have tested it, you would have discovered that it points to a different directory than for the other modules in the same package.

I don't understand. The extension module (I assume you mean complex.test which stems from complex/test.pyx) is not a package, hence it should not have a __path__ attribute at all, right?

dnicolodi commented 8 months ago

Yes, you are right. However, you cannot find the complex.test module in complex.__path__. This breaks the definition of path.

ogrisel commented 8 months ago

However, you cannot find the complex.test module in complex.path. This breaks the definition of path.

We could populate the __path__ list with the two folders: the regular one with the Python source tree and the one with the compiled extensions, right?

dnicolodi commented 8 months ago

We could populate the __path__ list with the two folders: the regular one with the Python source tree and the one with the compiled extensions, right?

There is no defined relation between the organization of the modules on the filesystem and the logical structure in the package. The files can be rename on installation, moved from one folder to another, etc. Code that assumes it can use the location of the modules on the filesystem or their name to infer the structure of the package is broken.

rgommers commented 8 months ago

The files can be rename on installation, moved from one folder to another, etc. Code that assumes it can use the location of the modules on the filesystem or their name to infer the structure of the package is broken.

It's possible indeed to rename/move on install, but it's rare. So this does fix things for the common case, without downsides I think (?). It's worrying that code in SciPy that I thought was working was actually silently broken before with an empty list, so this does seem like an improvement. Whether it works or not depends on the package's build definitions right, as far as I can tell it isn't possible to break it via some build/install setting.

dnicolodi commented 8 months ago

It does not work for extension modules. I assume that scipy wants pkgutil.walk_packages() to return extension modules too.

rgommers commented 8 months ago

It does not work for extension modules. I assume that scipy wants pkgutil.walk_packages() to return extension modules too.

Hmm yeah, you are correct. My comment about it being rare doesn't hold when walking the build dir to pick up extension modules - it's quite a bit more common for those to land in a different install location.

So this really won't work, and we're back to the iter_modules conversation on gh-557 I guess.

ogrisel commented 8 months ago

@dnicolodi I iterated on this PR to address the problem of the discoverability of compiled submodules in editable mode. Let me know what you think. In particular, I am not sure if submodules of namespace "packages" should be made discoverable by pkgutil.walk_packages one way or another but I am not sure it's possible in general.

dnicolodi commented 8 months ago

All extension modules are built into the root of the build directory, thus this approach still does not work if there are extension modules installed in different sub-packages. There also can be other files that have an extension that matches the one used to load python modules or python extension modules in the build directory. With this approach all these would be interpreted as being part of the package. As I mentioned in #557, there is a solution to make pkgutil.walk_packages() work, but I haven't hat time yet to implement it. Listing filesystems locations in __path__ is not a good starting point.

ogrisel commented 8 months ago

All extension modules are built into the root of the build directory, thus this approach still does not work if there are extension modules installed in different sub-packages.

This is not true. The source folder tree structure is still present to avoid name collisions in case compiled modules with the same local name exist in two different subpackages of the same top level package. For instance, here is the filesystem structure of scikit-learn when built in editable mode with meson:

``` $ find build -name "*.so" build/cp311/sklearn/tree/_utils.cpython-311-darwin.so build/cp311/sklearn/tree/_splitter.cpython-311-darwin.so build/cp311/sklearn/tree/_tree.cpython-311-darwin.so build/cp311/sklearn/tree/_criterion.cpython-311-darwin.so build/cp311/sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-311-darwin.so build/cp311/sklearn/metrics/_dist_metrics.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_argkmin_classmode.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_base.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_distances_reduction/_argkmin.cpython-311-darwin.so build/cp311/sklearn/metrics/_pairwise_fast.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/_bitset.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/_binning.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/common.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/_predictor.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/_gradient_boosting.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/utils.cpython-311-darwin.so build/cp311/sklearn/ensemble/_hist_gradient_boosting/splitting.cpython-311-darwin.so build/cp311/sklearn/ensemble/_gradient_boosting.cpython-311-darwin.so build/cp311/sklearn/cluster/_k_means_elkan.cpython-311-darwin.so build/cp311/sklearn/cluster/_k_means_common.cpython-311-darwin.so build/cp311/sklearn/cluster/_k_means_minibatch.cpython-311-darwin.so build/cp311/sklearn/cluster/_k_means_lloyd.cpython-311-darwin.so build/cp311/sklearn/cluster/_dbscan_inner.cpython-311-darwin.so build/cp311/sklearn/cluster/_hdbscan/_reachability.cpython-311-darwin.so build/cp311/sklearn/cluster/_hdbscan/_linkage.cpython-311-darwin.so build/cp311/sklearn/cluster/_hdbscan/_tree.cpython-311-darwin.so build/cp311/sklearn/cluster/_hierarchical_fast.cpython-311-darwin.so build/cp311/sklearn/feature_extraction/_hashing_fast.cpython-311-darwin.so build/cp311/sklearn/__check_build/_check_build.cpython-311-darwin.so build/cp311/sklearn/_loss/_loss.cpython-311-darwin.so build/cp311/sklearn/datasets/_svmlight_format_fast.cpython-311-darwin.so build/cp311/sklearn/linear_model/_sag_fast.cpython-311-darwin.so build/cp311/sklearn/linear_model/_sgd_fast.cpython-311-darwin.so build/cp311/sklearn/linear_model/_cd_fast.cpython-311-darwin.so build/cp311/sklearn/utils/_openmp_helpers.cpython-311-darwin.so build/cp311/sklearn/utils/_random.cpython-311-darwin.so build/cp311/sklearn/utils/_vector_sentinel.cpython-311-darwin.so build/cp311/sklearn/utils/_isfinite.cpython-311-darwin.so build/cp311/sklearn/utils/_heap.cpython-311-darwin.so build/cp311/sklearn/utils/_sorting.cpython-311-darwin.so build/cp311/sklearn/utils/_weight_vector.cpython-311-darwin.so build/cp311/sklearn/utils/_cython_blas.cpython-311-darwin.so build/cp311/sklearn/utils/sparsefuncs_fast.cpython-311-darwin.so build/cp311/sklearn/utils/_fast_dict.cpython-311-darwin.so build/cp311/sklearn/utils/arrayfuncs.cpython-311-darwin.so build/cp311/sklearn/utils/murmurhash.cpython-311-darwin.so build/cp311/sklearn/utils/_seq_dataset.cpython-311-darwin.so build/cp311/sklearn/utils/_typedefs.cpython-311-darwin.so build/cp311/sklearn/svm/_newrand.cpython-311-darwin.so build/cp311/sklearn/svm/_libsvm.cpython-311-darwin.so build/cp311/sklearn/svm/_liblinear.cpython-311-darwin.so build/cp311/sklearn/svm/_libsvm_sparse.cpython-311-darwin.so build/cp311/sklearn/manifold/_utils.cpython-311-darwin.so build/cp311/sklearn/manifold/_barnes_hut_tsne.cpython-311-darwin.so build/cp311/sklearn/_isotonic.cpython-311-darwin.so build/cp311/sklearn/preprocessing/_target_encoder_fast.cpython-311-darwin.so build/cp311/sklearn/preprocessing/_csr_polynomial_expansion.cpython-311-darwin.so build/cp311/sklearn/decomposition/_cdnmf_fast.cpython-311-darwin.so build/cp311/sklearn/decomposition/_online_lda_fast.cpython-311-darwin.so build/cp311/sklearn/neighbors/_ball_tree.cpython-311-darwin.so build/cp311/sklearn/neighbors/_kd_tree.cpython-311-darwin.so build/cp311/sklearn/neighbors/_partition_nodes.cpython-311-darwin.so build/cp311/sklearn/neighbors/_quad_tree.cpython-311-darwin.so ```

And here is the output of walk_packages on the sklearn top level package:

```python >>> from pkgutil import walk_packages >>> for info in walk_packages(sklearn.__path__, prefix="sklearn."): ... print(info.name) ... sklearn.__check_build sklearn.__check_build._check_build sklearn._build_utils sklearn._build_utils.openmp_helpers sklearn._build_utils.pre_build_helpers sklearn._build_utils.tempita sklearn._build_utils.version sklearn._config sklearn._distributor_init sklearn._loss sklearn._loss.link sklearn._loss.loss sklearn._loss.tests sklearn._loss.tests.test_link sklearn._loss.tests.test_loss sklearn._loss._loss sklearn._min_dependencies sklearn.base sklearn.calibration sklearn.cluster sklearn.cluster._affinity_propagation sklearn.cluster._agglomerative sklearn.cluster._bicluster sklearn.cluster._birch sklearn.cluster._bisect_k_means sklearn.cluster._dbscan sklearn.cluster._feature_agglomeration sklearn.cluster._hdbscan sklearn.cluster._hdbscan.hdbscan sklearn.cluster._hdbscan.tests sklearn.cluster._hdbscan.tests.test_reachibility sklearn.cluster._hdbscan._linkage sklearn.cluster._hdbscan._reachability sklearn.cluster._hdbscan._tree sklearn.cluster._kmeans sklearn.cluster._mean_shift sklearn.cluster._optics sklearn.cluster._spectral sklearn.cluster.tests sklearn.cluster.tests.common sklearn.cluster.tests.test_affinity_propagation sklearn.cluster.tests.test_bicluster sklearn.cluster.tests.test_birch sklearn.cluster.tests.test_bisect_k_means sklearn.cluster.tests.test_dbscan sklearn.cluster.tests.test_feature_agglomeration sklearn.cluster.tests.test_hdbscan sklearn.cluster.tests.test_hierarchical sklearn.cluster.tests.test_k_means sklearn.cluster.tests.test_mean_shift sklearn.cluster.tests.test_optics sklearn.cluster.tests.test_spectral sklearn.cluster._dbscan_inner sklearn.cluster._hierarchical_fast sklearn.cluster._k_means_common sklearn.cluster._k_means_elkan sklearn.cluster._k_means_lloyd sklearn.cluster._k_means_minibatch sklearn.compose sklearn.compose._column_transformer sklearn.compose._target sklearn.compose.tests sklearn.compose.tests.test_column_transformer sklearn.compose.tests.test_target sklearn.conftest sklearn.covariance sklearn.covariance._elliptic_envelope sklearn.covariance._empirical_covariance sklearn.covariance._graph_lasso sklearn.covariance._robust_covariance sklearn.covariance._shrunk_covariance sklearn.covariance.tests sklearn.covariance.tests.test_covariance sklearn.covariance.tests.test_elliptic_envelope sklearn.covariance.tests.test_graphical_lasso sklearn.covariance.tests.test_robust_covariance sklearn.cross_decomposition sklearn.cross_decomposition._pls sklearn.cross_decomposition.tests sklearn.cross_decomposition.tests.test_pls sklearn.datasets sklearn.datasets._arff_parser sklearn.datasets._base sklearn.datasets._california_housing sklearn.datasets._covtype sklearn.datasets._kddcup99 sklearn.datasets._lfw sklearn.datasets._olivetti_faces sklearn.datasets._openml sklearn.datasets._rcv1 sklearn.datasets._samples_generator sklearn.datasets._species_distributions sklearn.datasets._svmlight_format_io sklearn.datasets._twenty_newsgroups sklearn.datasets.data sklearn.datasets.descr sklearn.datasets.images sklearn.datasets.tests sklearn.datasets.tests.data sklearn.datasets.tests.data.openml sklearn.datasets.tests.data.openml.id_1 sklearn.datasets.tests.data.openml.id_1119 sklearn.datasets.tests.data.openml.id_1590 sklearn.datasets.tests.data.openml.id_2 sklearn.datasets.tests.data.openml.id_292 sklearn.datasets.tests.data.openml.id_3 sklearn.datasets.tests.data.openml.id_40589 sklearn.datasets.tests.data.openml.id_40675 sklearn.datasets.tests.data.openml.id_40945 sklearn.datasets.tests.data.openml.id_40966 sklearn.datasets.tests.data.openml.id_42074 sklearn.datasets.tests.data.openml.id_42585 sklearn.datasets.tests.data.openml.id_561 sklearn.datasets.tests.data.openml.id_61 sklearn.datasets.tests.data.openml.id_62 sklearn.datasets.tests.test_20news sklearn.datasets.tests.test_arff_parser sklearn.datasets.tests.test_base sklearn.datasets.tests.test_california_housing sklearn.datasets.tests.test_common sklearn.datasets.tests.test_covtype sklearn.datasets.tests.test_kddcup99 sklearn.datasets.tests.test_lfw sklearn.datasets.tests.test_olivetti_faces sklearn.datasets.tests.test_openml sklearn.datasets.tests.test_rcv1 sklearn.datasets.tests.test_samples_generator sklearn.datasets.tests.test_svmlight_format sklearn.datasets._svmlight_format_fast sklearn.decomposition sklearn.decomposition._base sklearn.decomposition._dict_learning sklearn.decomposition._factor_analysis sklearn.decomposition._fastica sklearn.decomposition._incremental_pca sklearn.decomposition._kernel_pca sklearn.decomposition._lda sklearn.decomposition._nmf sklearn.decomposition._pca sklearn.decomposition._sparse_pca sklearn.decomposition._truncated_svd sklearn.decomposition.tests sklearn.decomposition.tests.test_dict_learning sklearn.decomposition.tests.test_factor_analysis sklearn.decomposition.tests.test_fastica sklearn.decomposition.tests.test_incremental_pca sklearn.decomposition.tests.test_kernel_pca sklearn.decomposition.tests.test_nmf sklearn.decomposition.tests.test_online_lda sklearn.decomposition.tests.test_pca sklearn.decomposition.tests.test_sparse_pca sklearn.decomposition.tests.test_truncated_svd sklearn.decomposition._cdnmf_fast sklearn.decomposition._online_lda_fast sklearn.discriminant_analysis sklearn.dummy sklearn.ensemble sklearn.ensemble._bagging sklearn.ensemble._base sklearn.ensemble._forest sklearn.ensemble._gb sklearn.ensemble._hist_gradient_boosting sklearn.ensemble._hist_gradient_boosting.binning sklearn.ensemble._hist_gradient_boosting.gradient_boosting sklearn.ensemble._hist_gradient_boosting.grower sklearn.ensemble._hist_gradient_boosting.predictor sklearn.ensemble._hist_gradient_boosting.tests sklearn.ensemble._hist_gradient_boosting.tests.test_binning sklearn.ensemble._hist_gradient_boosting.tests.test_bitset sklearn.ensemble._hist_gradient_boosting.tests.test_compare_lightgbm sklearn.ensemble._hist_gradient_boosting.tests.test_gradient_boosting sklearn.ensemble._hist_gradient_boosting.tests.test_grower sklearn.ensemble._hist_gradient_boosting.tests.test_histogram sklearn.ensemble._hist_gradient_boosting.tests.test_monotonic_contraints sklearn.ensemble._hist_gradient_boosting.tests.test_predictor sklearn.ensemble._hist_gradient_boosting.tests.test_splitting sklearn.ensemble._hist_gradient_boosting.tests.test_warm_start sklearn.ensemble._hist_gradient_boosting._binning sklearn.ensemble._hist_gradient_boosting._bitset sklearn.ensemble._hist_gradient_boosting._gradient_boosting sklearn.ensemble._hist_gradient_boosting._predictor sklearn.ensemble._hist_gradient_boosting.common sklearn.ensemble._hist_gradient_boosting.histogram sklearn.ensemble._hist_gradient_boosting.splitting sklearn.ensemble._hist_gradient_boosting.utils sklearn.ensemble._iforest sklearn.ensemble._stacking sklearn.ensemble._voting sklearn.ensemble._weight_boosting sklearn.ensemble.tests sklearn.ensemble.tests.test_bagging sklearn.ensemble.tests.test_base sklearn.ensemble.tests.test_common sklearn.ensemble.tests.test_forest sklearn.ensemble.tests.test_gradient_boosting sklearn.ensemble.tests.test_iforest sklearn.ensemble.tests.test_stacking sklearn.ensemble.tests.test_voting sklearn.ensemble.tests.test_weight_boosting sklearn.ensemble._gradient_boosting sklearn.exceptions sklearn.experimental sklearn.experimental.enable_halving_search_cv sklearn.experimental.enable_hist_gradient_boosting sklearn.experimental.enable_iterative_imputer sklearn.experimental.tests sklearn.experimental.tests.test_enable_hist_gradient_boosting sklearn.experimental.tests.test_enable_iterative_imputer sklearn.experimental.tests.test_enable_successive_halving sklearn.externals sklearn.externals._arff sklearn.externals._packaging sklearn.externals._packaging._structures sklearn.externals._packaging.version sklearn.externals._scipy sklearn.externals._scipy.sparse sklearn.externals._scipy.sparse.csgraph sklearn.externals._scipy.sparse.csgraph._laplacian sklearn.externals.conftest sklearn.feature_extraction sklearn.feature_extraction._dict_vectorizer sklearn.feature_extraction._hash sklearn.feature_extraction._stop_words sklearn.feature_extraction.image sklearn.feature_extraction.tests sklearn.feature_extraction.tests.test_dict_vectorizer sklearn.feature_extraction.tests.test_feature_hasher sklearn.feature_extraction.tests.test_image sklearn.feature_extraction.tests.test_text sklearn.feature_extraction.text sklearn.feature_extraction._hashing_fast sklearn.feature_selection sklearn.feature_selection._base sklearn.feature_selection._from_model sklearn.feature_selection._mutual_info sklearn.feature_selection._rfe sklearn.feature_selection._sequential sklearn.feature_selection._univariate_selection sklearn.feature_selection._variance_threshold sklearn.feature_selection.tests sklearn.feature_selection.tests.test_base sklearn.feature_selection.tests.test_chi2 sklearn.feature_selection.tests.test_feature_select sklearn.feature_selection.tests.test_from_model sklearn.feature_selection.tests.test_mutual_info sklearn.feature_selection.tests.test_rfe sklearn.feature_selection.tests.test_sequential sklearn.feature_selection.tests.test_variance_threshold sklearn.gaussian_process sklearn.gaussian_process._gpc sklearn.gaussian_process._gpr sklearn.gaussian_process.kernels sklearn.gaussian_process.tests sklearn.gaussian_process.tests._mini_sequence_kernel sklearn.gaussian_process.tests.test_gpc sklearn.gaussian_process.tests.test_gpr sklearn.gaussian_process.tests.test_kernels sklearn.impute sklearn.impute._base sklearn.impute._iterative sklearn.impute._knn sklearn.impute.tests sklearn.impute.tests.test_base sklearn.impute.tests.test_common sklearn.impute.tests.test_impute sklearn.impute.tests.test_knn sklearn.inspection sklearn.inspection._partial_dependence sklearn.inspection._pd_utils sklearn.inspection._permutation_importance sklearn.inspection._plot sklearn.inspection._plot.decision_boundary sklearn.inspection._plot.partial_dependence sklearn.inspection._plot.tests sklearn.inspection._plot.tests.test_boundary_decision_display sklearn.inspection._plot.tests.test_plot_partial_dependence sklearn.inspection.tests sklearn.inspection.tests.test_partial_dependence sklearn.inspection.tests.test_pd_utils sklearn.inspection.tests.test_permutation_importance sklearn.isotonic sklearn.kernel_approximation sklearn.kernel_ridge sklearn.linear_model sklearn.linear_model._base sklearn.linear_model._bayes sklearn.linear_model._coordinate_descent sklearn.linear_model._glm sklearn.linear_model._glm._newton_solver sklearn.linear_model._glm.glm sklearn.linear_model._glm.tests sklearn.linear_model._glm.tests.test_glm sklearn.linear_model._huber sklearn.linear_model._least_angle sklearn.linear_model._linear_loss sklearn.linear_model._logistic sklearn.linear_model._omp sklearn.linear_model._passive_aggressive sklearn.linear_model._perceptron sklearn.linear_model._quantile sklearn.linear_model._ransac sklearn.linear_model._ridge sklearn.linear_model._sag sklearn.linear_model._stochastic_gradient sklearn.linear_model._theil_sen sklearn.linear_model.tests sklearn.linear_model.tests.test_base sklearn.linear_model.tests.test_bayes sklearn.linear_model.tests.test_common sklearn.linear_model.tests.test_coordinate_descent sklearn.linear_model.tests.test_huber sklearn.linear_model.tests.test_least_angle sklearn.linear_model.tests.test_linear_loss sklearn.linear_model.tests.test_logistic sklearn.linear_model.tests.test_omp sklearn.linear_model.tests.test_passive_aggressive sklearn.linear_model.tests.test_perceptron sklearn.linear_model.tests.test_quantile sklearn.linear_model.tests.test_ransac sklearn.linear_model.tests.test_ridge sklearn.linear_model.tests.test_sag sklearn.linear_model.tests.test_sgd sklearn.linear_model.tests.test_sparse_coordinate_descent sklearn.linear_model.tests.test_theil_sen sklearn.linear_model._cd_fast sklearn.linear_model._sag_fast sklearn.linear_model._sgd_fast sklearn.manifold sklearn.manifold._isomap sklearn.manifold._locally_linear sklearn.manifold._mds sklearn.manifold._spectral_embedding sklearn.manifold._t_sne sklearn.manifold.tests sklearn.manifold.tests.test_isomap sklearn.manifold.tests.test_locally_linear sklearn.manifold.tests.test_mds sklearn.manifold.tests.test_spectral_embedding sklearn.manifold.tests.test_t_sne sklearn.manifold._barnes_hut_tsne sklearn.manifold._utils sklearn.metrics sklearn.metrics._base sklearn.metrics._classification sklearn.metrics._pairwise_distances_reduction sklearn.metrics._pairwise_distances_reduction._dispatcher sklearn.metrics._pairwise_distances_reduction._argkmin sklearn.metrics._pairwise_distances_reduction._argkmin_classmode sklearn.metrics._pairwise_distances_reduction._base sklearn.metrics._pairwise_distances_reduction._datasets_pair sklearn.metrics._pairwise_distances_reduction._middle_term_computer sklearn.metrics._pairwise_distances_reduction._radius_neighbors sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode sklearn.metrics._plot sklearn.metrics._plot.confusion_matrix sklearn.metrics._plot.det_curve sklearn.metrics._plot.precision_recall_curve sklearn.metrics._plot.regression sklearn.metrics._plot.roc_curve sklearn.metrics._plot.tests sklearn.metrics._plot.tests.test_common_curve_display sklearn.metrics._plot.tests.test_confusion_matrix_display sklearn.metrics._plot.tests.test_det_curve_display sklearn.metrics._plot.tests.test_precision_recall_display sklearn.metrics._plot.tests.test_predict_error_display sklearn.metrics._plot.tests.test_roc_curve_display sklearn.metrics._ranking sklearn.metrics._regression sklearn.metrics._scorer sklearn.metrics.cluster sklearn.metrics.cluster._bicluster sklearn.metrics.cluster._supervised sklearn.metrics.cluster._unsupervised sklearn.metrics.cluster.tests sklearn.metrics.cluster.tests.test_bicluster sklearn.metrics.cluster.tests.test_common sklearn.metrics.cluster.tests.test_supervised sklearn.metrics.cluster.tests.test_unsupervised sklearn.metrics.cluster._expected_mutual_info_fast sklearn.metrics.pairwise sklearn.metrics.tests sklearn.metrics.tests.test_classification sklearn.metrics.tests.test_common sklearn.metrics.tests.test_dist_metrics sklearn.metrics.tests.test_pairwise sklearn.metrics.tests.test_pairwise_distances_reduction sklearn.metrics.tests.test_ranking sklearn.metrics.tests.test_regression sklearn.metrics.tests.test_score_objects sklearn.metrics._dist_metrics sklearn.metrics._pairwise_fast sklearn.mixture sklearn.mixture._base sklearn.mixture._bayesian_mixture sklearn.mixture._gaussian_mixture sklearn.mixture.tests sklearn.mixture.tests.test_bayesian_mixture sklearn.mixture.tests.test_gaussian_mixture sklearn.mixture.tests.test_mixture sklearn.model_selection sklearn.model_selection._plot sklearn.model_selection._search sklearn.model_selection._search_successive_halving sklearn.model_selection._split sklearn.model_selection._validation sklearn.model_selection.tests sklearn.model_selection.tests.common sklearn.model_selection.tests.test_plot sklearn.model_selection.tests.test_search sklearn.model_selection.tests.test_split sklearn.model_selection.tests.test_successive_halving sklearn.model_selection.tests.test_validation sklearn.multiclass sklearn.multioutput sklearn.naive_bayes sklearn.neighbors sklearn.neighbors._base sklearn.neighbors._classification sklearn.neighbors._graph sklearn.neighbors._kde sklearn.neighbors._lof sklearn.neighbors._nca sklearn.neighbors._nearest_centroid sklearn.neighbors._regression sklearn.neighbors._unsupervised sklearn.neighbors.tests sklearn.neighbors.tests.test_ball_tree sklearn.neighbors.tests.test_graph sklearn.neighbors.tests.test_kd_tree sklearn.neighbors.tests.test_kde sklearn.neighbors.tests.test_lof sklearn.neighbors.tests.test_nca sklearn.neighbors.tests.test_nearest_centroid sklearn.neighbors.tests.test_neighbors sklearn.neighbors.tests.test_neighbors_pipeline sklearn.neighbors.tests.test_neighbors_tree sklearn.neighbors.tests.test_quad_tree sklearn.neighbors._ball_tree sklearn.neighbors._kd_tree sklearn.neighbors._partition_nodes sklearn.neighbors._quad_tree sklearn.neural_network sklearn.neural_network._base sklearn.neural_network._multilayer_perceptron sklearn.neural_network._rbm sklearn.neural_network._stochastic_optimizers sklearn.neural_network.tests sklearn.neural_network.tests.test_base sklearn.neural_network.tests.test_mlp sklearn.neural_network.tests.test_rbm sklearn.neural_network.tests.test_stochastic_optimizers sklearn.pipeline sklearn.preprocessing sklearn.preprocessing._data sklearn.preprocessing._discretization sklearn.preprocessing._encoders sklearn.preprocessing._function_transformer sklearn.preprocessing._label sklearn.preprocessing._polynomial sklearn.preprocessing._target_encoder sklearn.preprocessing.tests sklearn.preprocessing.tests.test_common sklearn.preprocessing.tests.test_data sklearn.preprocessing.tests.test_discretization sklearn.preprocessing.tests.test_encoders sklearn.preprocessing.tests.test_function_transformer sklearn.preprocessing.tests.test_label sklearn.preprocessing.tests.test_polynomial sklearn.preprocessing.tests.test_target_encoder sklearn.preprocessing._csr_polynomial_expansion sklearn.preprocessing._target_encoder_fast sklearn.random_projection sklearn.semi_supervised sklearn.semi_supervised._label_propagation sklearn.semi_supervised._self_training sklearn.semi_supervised.tests sklearn.semi_supervised.tests.test_label_propagation sklearn.semi_supervised.tests.test_self_training sklearn.svm sklearn.svm._base sklearn.svm._bounds sklearn.svm._classes sklearn.svm.tests sklearn.svm.tests.test_bounds sklearn.svm.tests.test_sparse sklearn.svm.tests.test_svm sklearn.svm._liblinear sklearn.svm._libsvm sklearn.svm._libsvm_sparse sklearn.svm._newrand sklearn.tests sklearn.tests.metadata_routing_common sklearn.tests.random_seed sklearn.tests.test_base sklearn.tests.test_build sklearn.tests.test_calibration sklearn.tests.test_check_build sklearn.tests.test_common sklearn.tests.test_config sklearn.tests.test_discriminant_analysis sklearn.tests.test_docstring_parameters sklearn.tests.test_docstrings sklearn.tests.test_dummy sklearn.tests.test_init sklearn.tests.test_isotonic sklearn.tests.test_kernel_approximation sklearn.tests.test_kernel_ridge sklearn.tests.test_metadata_routing sklearn.tests.test_metaestimators sklearn.tests.test_metaestimators_metadata_routing sklearn.tests.test_min_dependencies_readme sklearn.tests.test_multiclass sklearn.tests.test_multioutput sklearn.tests.test_naive_bayes sklearn.tests.test_pipeline sklearn.tests.test_public_functions sklearn.tests.test_random_projection sklearn.tree sklearn.tree._classes sklearn.tree._export sklearn.tree._reingold_tilford sklearn.tree.tests sklearn.tree.tests.test_export sklearn.tree.tests.test_monotonic_tree sklearn.tree.tests.test_reingold_tilford sklearn.tree.tests.test_tree sklearn.tree._criterion sklearn.tree._splitter sklearn.tree._tree sklearn.tree._utils sklearn.utils sklearn.utils._arpack sklearn.utils._array_api sklearn.utils._available_if sklearn.utils._bunch sklearn.utils._encode sklearn.utils._estimator_html_repr sklearn.utils._joblib sklearn.utils._mask sklearn.utils._metadata_requests sklearn.utils._mocking sklearn.utils._param_validation sklearn.utils._plotting sklearn.utils._pprint sklearn.utils._response sklearn.utils._set_output sklearn.utils._show_versions sklearn.utils._tags sklearn.utils._testing sklearn.utils.class_weight sklearn.utils.deprecation sklearn.utils.discovery sklearn.utils.estimator_checks sklearn.utils.extmath sklearn.utils.fixes sklearn.utils.graph sklearn.utils.metadata_routing sklearn.utils.metaestimators sklearn.utils.multiclass sklearn.utils.optimize sklearn.utils.parallel sklearn.utils.random sklearn.utils.sparsefuncs sklearn.utils.stats sklearn.utils.tests sklearn.utils.tests.test_arpack sklearn.utils.tests.test_array_api sklearn.utils.tests.test_arrayfuncs sklearn.utils.tests.test_bunch sklearn.utils.tests.test_class_weight sklearn.utils.tests.test_cython_blas sklearn.utils.tests.test_cython_templating sklearn.utils.tests.test_deprecation sklearn.utils.tests.test_encode sklearn.utils.tests.test_estimator_checks sklearn.utils.tests.test_estimator_html_repr sklearn.utils.tests.test_extmath sklearn.utils.tests.test_fast_dict sklearn.utils.tests.test_fixes sklearn.utils.tests.test_graph sklearn.utils.tests.test_metaestimators sklearn.utils.tests.test_mocking sklearn.utils.tests.test_multiclass sklearn.utils.tests.test_murmurhash sklearn.utils.tests.test_optimize sklearn.utils.tests.test_parallel sklearn.utils.tests.test_param_validation sklearn.utils.tests.test_plotting sklearn.utils.tests.test_pprint sklearn.utils.tests.test_random sklearn.utils.tests.test_response sklearn.utils.tests.test_seq_dataset sklearn.utils.tests.test_set_output sklearn.utils.tests.test_shortest_path sklearn.utils.tests.test_show_versions sklearn.utils.tests.test_sparsefuncs sklearn.utils.tests.test_stats sklearn.utils.tests.test_tags sklearn.utils.tests.test_testing sklearn.utils.tests.test_typedefs sklearn.utils.tests.test_utils sklearn.utils.tests.test_validation sklearn.utils.tests.test_weight_vector sklearn.utils.validation sklearn.utils._cython_blas sklearn.utils._fast_dict sklearn.utils._heap sklearn.utils._isfinite sklearn.utils._openmp_helpers sklearn.utils._random sklearn.utils._seq_dataset sklearn.utils._sorting sklearn.utils._typedefs sklearn.utils._vector_sentinel sklearn.utils._weight_vector sklearn.utils.arrayfuncs sklearn.utils.murmurhash sklearn.utils.sparsefuncs_fast sklearn._built_with_meson sklearn._isotonic ```

so both Python and compiled extensions are recursively discovered successfully using this strategy.

There also can be other files that have an extension that matches the one used to load python modules or python extension modules in the build directory. With this approach all these would be interpreted as being part of the package.

I agree in theory but will that ever be the case under the build/ folder of a meson-python editable build?

ogrisel commented 8 months ago

But I can try to update this PR to add iter_modules method to the MesonpyMetaFinder class instead.

EDIT: the proper way to do it seems to implement this iter_modules method on a path entry finder for mesonpy and register it via path_hook class method in sys.path_hooks similarly to was is done by default for the regular FileFinder class. pkgutil.get_importer (used indirectly by pkgutil.walk_packages/iter_modules) would then be able to look it up for a given package file system path.

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

EDIT2: but looking again at the code of pkgutil.walk_packages, customizing iter_modules alone would not be enough, as walk_packages still relies on the __path__ attribute of a subpackage to do its recursive call. But customizing iter_modules should make it possible to filter out files that should not be considered submodules of a given package.

dnicolodi commented 8 months ago

This is not true.

It depends on how the build definition is structured. If all extension modules are defined in a meson.build in the root project, the build files are all created in the root of the build directory. What we need is not something that works for sklearn but a general solution.

It would be polite if you could at least consider that the maintainer of a project to which you are contributing spent a bit more time than you thinking about the problem space a bit more than you. If they tell you that an approach to solve an issue does not work, it would be nice if you could at least try to understand the objections.

dnicolodi commented 8 months ago

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

__path__ makes sense only within the python import system implementation. Relaying on it to point to any specific filesystem folder (or to a filesystem folder at all) is bogus and always will be. It breaks zip imports, bundled python applications, and whatnot. I see the fact that meson-python editable installs breaks this wrong expectation as a positive thing that will move developers away from using the __path__ attribute for things for which it is not designed.

dnicolodi commented 8 months ago

I agree in theory but will that ever be the case under the build/ folder of a meson-python editable build?

Sure. For example if a project generates a version.py module with the project version, or a shared library on platforms where both extension modules and shared libraries have the same filename extension.

ogrisel commented 8 months ago

It depends on how the build definition is structured. If all extension modules are defined in a meson.build in the root project, the build files are all created in the root of the build directory. What we need is not something that works for sklearn but a general solution.

Alright, sorry I had not considered it was possible to do otherwise.

It would be polite if you could at least consider that the maintainer of a project to which you are contributing spent a bit more time than you thinking about the problem space a bit more than you. If they tell you that an approach to solve an issue does not work, it would be nice if you could at least try to understand the objections.

Sorry.

eli-schwartz commented 8 months ago

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

As a dunder attribute it is formally reserved by the implementation. The only thing you are allowed to do with pure semantics here is to use __path__ according to its documentation, to mark a package vs a module and to optionally have it as an empty list OR use it to instruct the import machinery where to search for submodules.

Any code that takes this dunder attribute and doesn't fully implement the semantics of the import machinery in question is automatically broken. This includes detecting cases where it is empty and recognizing that a different import machinery finder must be used if available.

Of course, using this dunder attribute is a convenient hack, but many things are a convenient hack. Emphasis on "hack".

ogrisel commented 8 months ago

The only thing you are allowed to do with pure semantics here is to use path according to its documentation, to mark a package vs a module and to optionally have it as an empty list OR use it to instruct the import machinery where to search for submodules.

Any code that takes this dunder attribute and doesn't fully implement the semantics of the import machinery in question is automatically broken. This includes detecting cases where it is empty and recognizing that a different import machinery finder must be used if available.

According to the above interpretation pkgutil.walk_packages, which is part of the standard library, would be considered broken: indeed it only relies on __path__ to recursively find n+1-level sub-packages. A custom iter_modules implementation alone would not fix this as iter_modules is meant to return direct submodules, not submodules of subpackages.

As far as I know the import machinery does not specify any other way to discover subpackages transitively.

I checked that pkgutil.walk_packages works as expected on the following test zip package structure:

$ zipinfo level0_package.zip
Archive:  level0_package.zip
Zip file size: 1304 bytes, number of entries: 6
drwxr-xr-x  2.0 unx        0 bx stor 24-Jan-23 18:16 level0_package/
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:15 level0_package/level1_module.py
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:15 level0_package/__init__.py
drwxr-xr-x  2.0 unx        0 bx stor 24-Jan-23 18:17 level0_package/level1_package/
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:16 level0_package/level1_package/__init__.py
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:17 level0_package/level1_package/level2_module.py
6 files, 0 bytes uncompressed, 0 bytes compressed:  0.0%
>>> import sys
>>> sys.path.insert(0, "level0_package.zip")
>>> import level0_package
>>> level0_package.__path__
['level0_package.zip/level0_package']
>>> from pkgutil import walk_packages
>>> for info in walk_packages(level0_package.__path__, prefix="level0_package."):
...     print(info.name)
... 
level0_package.level1_module
level0_package.level1_package
level0_package.level1_package.level2_module

>>> import level0_package.level1_package
>>> level0_package.level1_package.__path__
['level0_package.zip/level0_package/level1_package']

and furthermore to confirm that iter_modules only iterates on direct descendants:

>>> from pkgutil import iter_modules
>>> for info in iter_modules(level0_package.__path__, prefix="level0_package."):
...     print(info.name)
... 
level0_package.level1_module
level0_package.level1_package

If you agree with my edited comment above (https://github.com/mesonbuild/meson-python/pull/562#issuecomment-1900568972), I think it's possible to compute __path__ attribute for meson-python editable installs as done in this PR to get pkg.walk_packages to work as expected while also registering a custom file entry finder with a specific iter_modules implementation that would avoid introducing false positive submodules in case all compiled extensions of meson-python installed project are located in a shared folder (assuming no name collisions).

I can try to update this PR accordingly.

dnicolodi commented 7 months ago

The proper fix is in #569.