Closed aaronknister closed 1 month ago
I don't love to add this just because someone else has a bug, but I guess it's pretty harmless (and prevents other libraries from introducing the same issue, as unlikely as that may be).
Have you confirmed that this fixes the issue with pyspark?
If it can fix a pyspark issue that's definitely a good thing since it is a relatively small change. I would want to ensure that this all plays nice with numba though, since numba support for named-tuples is performance critical in some places, and may not be happy with all of this (hence the desire to check). All the tests have passed, so that's deifnitely a good sign that numba is failing on the extra keyword argument. Just to be sure, however, I would really appreciate a quick performance benchmark comparison (which the tests don't check) -- so index build time comparisons, and maybe if you have time a comparison of with and without this change on, say, the fashion-mnist ann-benchmarks, or something equivalent that decently exercises the code and records performance measures.
FWIW the underlying pyspark issue was fixed in v3.3.0: https://spark.apache.org/releases/spark-release-3-3-0.html#other-notable-changes-2.
this was suggested by @kmurphy4 on github.com/lmcinnes/umap/issues/477 as a work around to pyspark hijacking collection.namedtuple