gabrieldemarmiesse commented 7 months ago

Review Mojo's priorities

[X] I have read the roadmap and priorities and I believe this request falls within the priorities.

What is your request?

Many people voice their opinion about the need to redo a stdlib from scratch, and to deviate from Python's naming and structure on this. I must say that I do not hold this opinion, but I'm opening this issue so that users have somewhere to discuss this, as pull requests are not the best place to do it. See this PR for some background.

Many people can chim in. My personal opinion is that Mojo must do it's best to be a superset of Python (even if that's an impossible goal). This is because it will allow the maintainers of pure-python packages to write code that will work in both Mojo and CPython, similar to how they write code which work in both CPython and Pypy.

What is your motivation for this change?

I'm myself the maintainer of a pure python package which has moderate success.

I want to support Mojo in the future (let's say 2+ years), but none of the following options are reasonnable:

Maintain two codebases, one in mojo and one in python, that's too much work
Write the code for mojo only, and ship it on Pypi as a python extension. Working on and deploying Cpython extensions is a lot of work, way more complicated that pure python code, and we also loose some portability.
Write the code for mojo only and forget about Python users (hell nah)
Let the Mojo tool automatically rewrite my python code in mojo at every release. That would put an incredible amount of pressure on the makers of this tool so that it's perfect. Otherwise we need to put a human in the loop. That would be a very complicated setup, and I would ask my contributors to not contribute code that would break the tool.
Write the code in python only, do not migrate the codebase to mojo. This would require Mojo users of my library to run my lib through the CPython interpreter, which is not good for deployment purposes. Nobody likes to depends on the CPython interpreter for deployment, especially for CLIs, mobile devices, wasm, desktop apps, embedded devices, etc...

The only option I would have is to use a subset of mojo which is compatible with CPython. Which would be very very very restricted if the standard library have different names for functions.

Any other details?

Pure Python packages represent around 90% of the top 5000 packages on Pypi. Those are packages that do not benefit from all the Mojo features as performance isn't critical for their use case. It's going to be really hard to sell them any of the options I described above. The only option which is reasonnable for them is to use the subset of Mojo which is compatible with CPython as it requires very few changes and can be easily tested in the CI.

That means we inherit the CPython design mistakes about the stdlib, but we get an reasonnable path to use 90% of the Pypi packages without the CPython interpreter.

gryznar commented 7 months ago

I disagree. These packages cannot get its full potential as they are if Mojo will be semantically compatible. Mojo offers much more than tending to be only Python superset in terms of speed for example. Copying Python is a waste of its potential as a language and ball and chain. After that Mojo will be yet another compiled Python, where there are few other candidates in this area.

melodyogonna commented 7 months ago

The question has to be whether Cpython's stdlib is part of Python's specification and if Mojo is required to implement it in the faith of being a superset.

MoSafi2 commented 7 months ago

@gryznar deviating from python syntax where there is no performance penalities will probably create friction points for python devs trying to adapt Mojo, having to rename your functions and making all kind of sutble changes is painful. If mojo would like to become a seperate language with python-like syntax similair to Nim but with great python-interop, that is great but the value proposition would chnage drastically.

soraros commented 7 months ago

I don't think it's ideal or even possible to model Mojo's stdlib over Python's.

Python's stdlib is designed with its own advantages and limitations in mind, most noticeably, it's a stdlib for a dynamic language with reference semantics. The semantical difference is not that prominent only because numbers (also the most developed/tested part of Mojo) happen to have value semantics in Python. Even lists behave dramatically differently in these two languages, it's somewhat unreasonable (at least for the moment) to expect a codebase that makes use of any custom class to have consistent semantics between Mojo and Python.
Given the vision.md favours performance over anything else, any design choice that might hinder performance should be made with extra caution. Also, it's not just hypothetical, there are already cases where mimicking Python being actively harmful, e.g. one-method Hashable trait.

If we can't copy Python's API, I see little reason to copy Python's naming exactly. We can (and probably should) take as much inspiration as possible from Python to reduce bikeshedding, but not more than necessary.

All that said, I'm not against having a separate Python compatible stdlib when we have more support for dynamic features like class. The api surface is not gonna be that large, for we only (still a lot of work) need to implement the C based builtin types. It's kinda unavoidable and is already happening: stdlib has Dictionary which is not the same as Dict.

soraros commented 7 months ago

I want to express my many thanks to @gabrieldemarmiesse for opening this issue despite they themselves not necessarily agreeing with it.

soraros commented 7 months ago

I thought of another reason why we might want to design the stdlib differently (from Python).

Python doesn't support overloading functions, while Mojo does. Well, technically, we have singledispatch which nobody uses, and the optional parameter trick which can get quite messy. Overloading in Mojo, is typically used in two ways:

To make the Python counterpart statically typed: we can create an overload for each concrete type which the Python function supports. If we should design the API this way is debatable, but it is definitely useful and has its place. Example:
```
# .py
def f(a: int | bool): ...
```

.mojo

fn f(a: Int): ... fn f(a: Bool): ...

2. To support a function which has different usage modes, example: `range`.
```mojo
# Slideware Python range
def range(start, step=None, stop=None):
  if step is None and stop is None:
    # noticeably, the name is wrong, variable `start` contains the end value
    return range(0, start, 1)
  elif ...:
    ...

The two cases overlap, but they are different things from an API design point of view. Having great support for the second category opens up a huge design space. Consider the following design inspired by Swift:

var s: String = "..."
_ = s.remove(prefix=".")
_ = s.remove(suffix=".")
_ = s.remove(while_true=lambda ...: ...)

The implementation for such an API will look horrible in Python (which is also why Python people typically don't make designs like this), but not so in Mojo. It's not a real proposal, but I definitely don't think we want to say No to such designs out right.

I think the recent API change from DTypePointer.simd_store etc. to DTypePointer.store[width=] is a reflection of this trend.

Brian-M-J commented 7 months ago

To expand on @soraros 's point, the features Mojo has that Python doesn't can enable faster, cleaner, more correct and easier to use APIs for the stdlib:

The typestate pattern...
- It moves certain types of errors from run-time to compile-time, giving programmers faster feedback.
- It interacts nicely with IDEs, which can avoid suggesting operations that are illegal in a certain state.
- It can eliminate run-time checks, making code faster/smaller.

...and type-driven design (1) (2):

the thoughtful use of types can make a design more transparent and improve correctness at the same time

Extensions would enable fluent interfaces and railway oriented programming/pipeline oriented programming. It also means inconveniences like str.join() being a method on str instead of iterables can be fixed.
Guaranteed destruction and context managers mean that there's no need to provide public APIs like .close(). Resource release can be handled automatically, with no need of checking at runtime whether the programmer already called .close().
Mutable value semantics with lazy copying means that there's no need for separate .sort() and sorted() functions depending on whether you need the original value or not. The language cooperates with your intention and gives maximum performance either way. It also means that most functions can be made pure, which aids testability and debuggability.

Here's a motivating example - reduce:

In Java, reduce is overloaded such that if an initializer is provided, it returns a T but if an initializer is not provided, it returns an Optional<T> instead. The return type changes based on the overload.

This avoids surprising behaviour where in a long stream/iterable processing operation, you expect reduce to return a value but it throws a TypeError instead:

from functools import reduce
from operator import add

reduce(add, [])  # Throws a TypeError

Because of Python's dynamic typing, it cannot provide this kind of safety. Mojo, being a statically typed language, can and should help programmers avoid footguns like these.

modularml / mojo

[Feature Request] Consider deviating from Python's naming & structure for Mojo's stdlib #2113

Review Mojo's priorities

What is your request?

What is your motivation for this change?

Any other details?

.mojo