Open gabrieldemarmiesse opened 7 months ago
I disagree. These packages cannot get its full potential as they are if Mojo will be semantically compatible. Mojo offers much more than tending to be only Python superset in terms of speed for example. Copying Python is a waste of its potential as a language and ball and chain. After that Mojo will be yet another compiled Python, where there are few other candidates in this area.
The question has to be whether Cpython's stdlib is part of Python's specification and if Mojo is required to implement it in the faith of being a superset.
@gryznar deviating from python syntax where there is no performance penalities will probably create friction points for python devs trying to adapt Mojo, having to rename your functions and making all kind of sutble changes is painful. If mojo would like to become a seperate language with python-like syntax similair to Nim but with great python-interop, that is great but the value proposition would chnage drastically.
I don't think it's ideal or even possible to model Mojo's stdlib over Python's.
Python's stdlib is designed with its own advantages and limitations in mind, most noticeably, it's a stdlib for a dynamic language with reference semantics. The semantical difference is not that prominent only because numbers (also the most developed/tested part of Mojo) happen to have value semantics in Python. Even lists behave dramatically differently in these two languages, it's somewhat unreasonable (at least for the moment) to expect a codebase that makes use of any custom class to have consistent semantics between Mojo and Python.
Given the vision.md
favours performance over anything else, any design choice that might hinder performance should be made with extra caution. Also, it's not just hypothetical, there are already cases where mimicking Python being actively harmful, e.g. one-method Hashable
trait.
If we can't copy Python's API, I see little reason to copy Python's naming exactly. We can (and probably should) take as much inspiration as possible from Python to reduce bikeshedding, but not more than necessary.
All that said, I'm not against having a separate Python compatible stdlib when we have more support for dynamic features like class
. The api surface is not gonna be that large, for we only (still a lot of work) need to implement the C based builtin types. It's kinda unavoidable and is already happening: stdlib has Dictionary
which is not the same as Dict
.
I want to express my many thanks to @gabrieldemarmiesse for opening this issue despite they themselves not necessarily agreeing with it.
I thought of another reason why we might want to design the stdlib differently (from Python).
Python doesn't support overloading functions, while Mojo does. Well, technically, we have singledispatch
which nobody uses, and the optional parameter trick which can get quite messy. Overloading in Mojo, is typically used in two ways:
# .py
def f(a: int | bool): ...
fn f(a: Int): ... fn f(a: Bool): ...
2. To support a function which has different usage modes, example: `range`.
```mojo
# Slideware Python range
def range(start, step=None, stop=None):
if step is None and stop is None:
# noticeably, the name is wrong, variable `start` contains the end value
return range(0, start, 1)
elif ...:
...
The two cases overlap, but they are different things from an API design point of view. Having great support for the second category opens up a huge design space. Consider the following design inspired by Swift:
var s: String = "..."
_ = s.remove(prefix=".")
_ = s.remove(suffix=".")
_ = s.remove(while_true=lambda ...: ...)
The implementation for such an API will look horrible in Python (which is also why Python people typically don't make designs like this), but not so in Mojo. It's not a real proposal, but I definitely don't think we want to say No to such designs out right.
I think the recent API change from DTypePointer.simd_store
etc. to DTypePointer.store[width=]
is a reflection of this trend.
To expand on @soraros 's point, the features Mojo has that Python doesn't can enable faster, cleaner, more correct and easier to use APIs for the stdlib:
- It moves certain types of errors from run-time to compile-time, giving programmers faster feedback.
- It interacts nicely with IDEs, which can avoid suggesting operations that are illegal in a certain state.
- It can eliminate run-time checks, making code faster/smaller.
...and type-driven design (1) (2):
the thoughtful use of types can make a design more transparent and improve correctness at the same time
Extensions would enable fluent interfaces and railway oriented programming/pipeline oriented programming. It also means inconveniences like str.join()
being a method on str
instead of iterable
s can be fixed.
Guaranteed destruction and context managers mean that there's no need to provide public APIs like .close()
. Resource release can be handled automatically, with no need of checking at runtime whether the programmer already called .close()
.
Mutable value semantics with lazy copying means that there's no need for separate .sort()
and sorted()
functions depending on whether you need the original value or not. The language cooperates with your intention and gives maximum performance either way. It also means that most functions can be made pure, which aids testability and debuggability.
Here's a motivating example - reduce:
In Java, reduce
is overloaded such that if an initializer is provided, it returns a T
but if an initializer is not provided, it returns an Optional<T>
instead. The return type changes based on the overload.
This avoids surprising behaviour where in a long stream/iterable processing operation, you expect reduce
to return a value but it throws a TypeError
instead:
from functools import reduce
from operator import add
reduce(add, []) # Throws a TypeError
Because of Python's dynamic typing, it cannot provide this kind of safety. Mojo, being a statically typed language, can and should help programmers avoid footguns like these.
Review Mojo's priorities
What is your request?
Many people voice their opinion about the need to redo a stdlib from scratch, and to deviate from Python's naming and structure on this. I must say that I do not hold this opinion, but I'm opening this issue so that users have somewhere to discuss this, as pull requests are not the best place to do it. See this PR for some background.
Many people can chim in. My personal opinion is that Mojo must do it's best to be a superset of Python (even if that's an impossible goal). This is because it will allow the maintainers of pure-python packages to write code that will work in both Mojo and CPython, similar to how they write code which work in both CPython and Pypy.
What is your motivation for this change?
I'm myself the maintainer of a pure python package which has moderate success.
I want to support Mojo in the future (let's say 2+ years), but none of the following options are reasonnable:
The only option I would have is to use a subset of mojo which is compatible with CPython. Which would be very very very restricted if the standard library have different names for functions.
Any other details?
Pure Python packages represent around 90% of the top 5000 packages on Pypi. Those are packages that do not benefit from all the Mojo features as performance isn't critical for their use case. It's going to be really hard to sell them any of the options I described above. The only option which is reasonnable for them is to use the subset of Mojo which is compatible with CPython as it requires very few changes and can be easily tested in the CI.
That means we inherit the CPython design mistakes about the stdlib, but we get an reasonnable path to use 90% of the Pypi packages without the CPython interpreter.