pyscript / pyscript

Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2
https://pyscript.net/
Apache License 2.0
17.85k stars 1.44k forks source link

[DISCUSSION] Behavior of <py-script src=...> #895

Closed antocuni closed 8 months ago

antocuni commented 1 year ago

The current behavior of <py-script src=...> is confusing and suboptimal IMHO. It is implemented in this way: https://github.com/pyscript/pyscript/blob/214e39537bf18e1bec65153fdaa2fce355999693/pyscriptjs/src/components/pyscript.ts#L19-L31

i.e., it fetches the URL and just executes it in the global namespace. But it has many problems:

  1. since we are using await fetch(), the code might be executed out of order w.r.t. the py-script which are inline. Consider e.g. this test:

    def test_src_vs_inline(self):
        import textwrap
        foo_src = textwrap.dedent("""
            print('hello from foo')
        """)
        # foo_src += '#' * (1024*1024) # make the file artificially bigger by 1MB
        self.writefile("foo.py", foo_src)
    
        self.pyscript_run(
            """
            <py-script src="foo.py"></py-script>
    
            <py-script>
                print('hello from py-script')
            </py-script>
            """
        )

If I run the test on my machine, I get the two prints in a random order:

--- first run ---
[  5.85 console.log     ] hello from py-script
[  5.85 console.log     ] hello from foo

--- second run ---
[  5.47 console.log     ] hello from foo
[  5.47 console.log     ] hello from py-script

But if I uncomment the line which makes foo artificially bigger, I get this consistently, because the file takes longer to download:

[  5.65 console.log     ] hello from py-script
[  5.76 console.log     ] hello from foo

This is the latest new entry in the list of problems caused by the fact that we use runPythonAsync intead of runPython.

Related: https://github.com/pyscript/pyscript/issues/878 and https://github.com/pyscript/pyscript/issues/879

  1. It plays very badly with config.paths. For example, consider the following test (the asyncio.sleep() are needed to work around the previous problem):
    def test_src_vs_import(self):
        import textwrap
        self.writefile("foo.py", textwrap.dedent("""
            X = 0
            def say_hello():
                global X
                print('hello from foo, X =', X)
                X += 1
        """))

        self.pyscript_run(
            """
            <py-config>
                paths=['foo.py']
            </py-config>

            <py-script src="foo.py"></py-script>

            <py-script>
                import asyncio
                await asyncio.sleep(0.2) # XXX
                say_hello()
            </py-script>

            <py-script>
                await asyncio.sleep(0.3) # XXX
                say_hello()
            </py-script>

            <py-script>
                await asyncio.sleep(0.4) # XXX
                import foo
                foo.say_hello()
            </py-script>
            """
        )

This is what you get:

[  6.29 console.log     ] hello from foo, X = 0
[  6.29 console.log     ] hello from foo, X = 1
[  6.29 console.log     ] hello from foo, X = 0

This happens because the code inside foo.py is actually executed twice: with src="foo.py" we execute it in the global namespace, then with import we execute it again in the proper module. So we get two copies of X and say_hello(), each working independently of each other.

This behavior is very confusing unless you know very well the internals of Python, and we should avoid it at all costs.

To underline how confusing it is, we even made a mistake in our own docs 😱 https://github.com/pyscript/pyscript/blob/214e39537bf18e1bec65153fdaa2fce355999693/docs/reference/elements/py-script.md#L37-L53

in the docs above the author felt the need to add compute_pi.py to config.paths, but it's not really needed and the file will be actually downloaded twice.


Proposals for solution

Proposal 1: just kill src

We don't really need it, it is possible to achieve the desired result in this way:

<py-config>
  paths=['foo.py']
</py-config>

<py-script>
  import foo
</py-script>

Simple, effective, very explicit, works out of the box.

Proposal 2: kill src but add an import (or py-import?) attribute

This would be the equivalent to the previous example:

<py-config>
  paths=['foo.py']
</py-config>

<py-script import="foo"></py-script>

Proposal 3: automatically add imports to paths

Similar to proposal (2) but you don't need to explicitly add paths=[...]

<py-script import="./foo.py"></py-script>

This is by far my least favorite, because it opens many questions (e.g., if I do import="foo.py" does it mean that I want to download and import ./foo.py or that I want to import the already-installed py module from the foo package?

Moreover it complicates the implementation because we would need to search for import attributes when we download the other paths, etc.

fpliger commented 1 year ago

@antocuni, good news.... we disagree! 🀣

Ok, with my serious face on, I do agree that the current design/implementation is prone to confusing edge cases but, to properly tackle it, I think we need to separate the topics list above: 1. how we handle execution flow (and related time to fetch and load files) 2. execution as modules or in the global namespace

On 1 , the problem here is the question between serializing vs parallel. Basically, should we execute <py-scripts> in order and wait for each tag to load or run before we go with the next one vs. can we run all <py-script> like they are independent and decoupled from each other? This is actually also very relevant to the work that needs to be done as we move towards supporting web workers btw...

On the above, I vote to give users an option and have an explicit attribute that tells us how to execute their code.

On the topic 2, I do acknowledge that we do have a corner case in the situation presented above, when using a module both as module (loaded in paths) and as src in the <py-script> tag but I think it's a problem on better defining the current API rather than a full redesign. In fact, I'm -10 on all the proposals above. It really feels unnatural and over complicated.

In my mind, the very simple solution is that we do not allow users to both <py-config>paths=['foo.py']</py-config> and <py-script src="foo.py"></py-script> and we raise an error if they do. In fact, in that case they should have done <py-script src="bar.py"></py-script> (where bar contains import foo) or <py-script >import foo .... </py-script>

If we really want, we could also support <py-script import="foo.py"></py-script> but I'd find it confusing and I can't find a reason users would be doing that instead of <py-config>paths=['foo.py']</py-config> . With that said, if there's an use case for it and our users need something like that, I'm more than happy to change my mind.

antocuni commented 1 year ago

@antocuni, good news.... we disagree!

πŸŽ‰ πŸ˜‚

Ok, with my serious face on, I do agree that the current design/implementation is prone to confusing edge cases but, to properly tackle it, I think we need to separate the topics list above: 1. how we handle execution flow (and related time to fetch and load files) 2. execution as modules or in the global namespace

Yes, sorry for having mixed the two. I started to write the issue to explain (2) and when I tried to write the example I noticed (1). I couldn't explain (2) without explaining (1), that's why they are together. My take on (1) is that the approach "let's happily allow top-level await and use runPythonAsync without really understanding all the implications" is causing way too many problems and should be rethought. But that's a topic for another discussion, let's not have it here. [cut]

On the topic 2, I do acknowledge that we do have a corner case in the situation presented above, when using a module both as module (loaded in paths) and as src in the <py-script> tag but I think it's a problem on better defining the current API rather than a full redesign. In fact, I'm -10 on all the proposals above. It really feels unnatural and over complicated.

The difference in our views is that what you consider a "corner case" it's really a fundamental property of the current semantics of src, which doesn't play with the Python semantics. We cannot change Python semantics but we can chance src.

Before going in the details, a quick on the "design methodology" that we should follow IMHO

Design methodology We are trying to run Python in a new environment which is very different than the one it was designed for and people are used to. Every feature that we add -- especially the low-level ones such as "where the code lives and how/when it's executed" -- can potentially interact in unexpected ways with the pre-existing semantics of Python. Because of this, we should be **really careful** when adding features: the default approach should be conservative and "deny by default" until we are sure to understand all the implications of our choice. The approach "let's add random features because they are nice" is doomed IMHO. Semi-related: the fact that a certain feature managed to sneak in into PyScript Alpha should not give it a special status: from my point of view, for this specific issue the question is **not** > we have `src`, and maybe we should remove it The question is: > we do *not* have `src`, if it a good idea to add it? **End of digression**

In my mind, the very simple solution is that we do not allow users to both <py-config>paths=['foo.py']</py-config> and <py-script src="foo.py"></py-script> and we raise an error if they do.

this is not a simple solution, is a workaround to fix one specific problem of this approach, but it doesn't solve the more fundamental problems. Let me try to explain better.

The current semantics of <py-script src=...> is roughly the following:

import __main__
pysrc = open(src).read()
exec(pysrc, __main__.__dict__)

I.e., you are taking the python code and executing it in the global namespace (which is just the namespace of the __main__ module -- that's how pyodide works). This opens a can of worms because it leads to all sorts of confusing behaviors.

The first random example which comes to my mind:

# rectangle.py
import js
from pyodide.ffi import create_proxy

def draw():
    print('drawing a rectangle')

def on_click(event):
    print('clicked on btn-rect')
    draw()

on_click = create_proxy(on_click)
js.document.getElementById('btn-rect').addEventListener('click', on_click)

# circle.py
import js
from pyodide.ffi import create_proxy

def draw():
    print('drawing a circle')

def on_click(event):
    print('clicked on btn-circle')
    draw()

on_click = create_proxy(on_click)
js.document.getElementById('btn-circle').addEventListener('click', on_click)
<py-script src="rectangle.py"></py-script>
<py-script src="circle.py"></py-script>

Live example If you click on the button "Draw rectangle", you see this in the console:

clicked on btn-rect
drawing a circle

(the explanation of why is left to the reader).

Tell me, isn't this very confusing behavior?

And note that it's not even completely invented: this is a pattern which I saw in real world examples of people using <py-script src=...>, for example here: https://github.com/nmstoker/ChessMatchViewer https://github.com/nmstoker/ChessMatchViewer/blob/main/chess_script.py#L143-L151

And here: https://github.com/kolibril13/pyscript-emoji-skimage https://github.com/kolibril13/pyscript-emoji-skimage/blob/main/emoji_playground.py#L147-L153


"Computer Science 101" explanation

The current model completely break composability of different .py files. The behavior of the file completely depends on which other files and <py-script> tags were executed before, and it's basically impossible to avoid breaking stuff accidentally.

The first big problem is that we mixing everything in a single global namespace. The most notable example of language with this behavior is C, and all modern languages recognized that it's a problem and implemented namespaces in one way or another

The second big problem is that with two notable exceptions (see below) all languages which I am aware of treat files as a distinct unit of compilation which can be analyzed individually.

Even in C, with its global namespace, you can analyze a single file a know which function calls are resolved internally and which ones depend on external symbols. But apparently PyScript decided that it's smarter than everyone else and that concatenating multiple files into a big unique chunk of code is a good idea :man_shrugging: .

The two notable exceptions (that I am aware of) to this rule are:

Ironically, the JS world quickly recognized that this was a huge problem and came up with multiple solution to overcome it (e.g., every file contains a big anonymous function which defines its own scope and it's called immediately). And then they invented 10 different ways of defining modules. And now they are even in the HTML so you can say <script type="module" src=...>.

In the JS world it took years to fix the original sin of "everything is global". Let's avoid doing the same mistake again.

NOTE: I am 100% aware that we need a way to split behavior into multiple files. But not with this semantics.

ntoll commented 1 year ago

This is a fascinating discussion. Thank you so far for such a thoughtful read.

What follows is only a small fragment of what I originally wrote. I had to write a whole bunch of stuff, only to be able to refine my thinking to get to the (much shorter) contribution offered below.

@fpliger's distinction between the problems is spot on: order of execution vs scope of execution.

Let me briefly deal with the first from MicroPyScript's point of view:

Put simply, it's all handled async via messages. When the source code of a script is obtained, either from the innerHTML or by fetching from the URL in src (incidentally, if the fetch doesn't work... you get a non-200 response..., it raises an exception and everything grinds to a halt, so fundamental is this problematic state of affairs), it dispatches a py-script-loaded event. If this happens before the runtime is finished loading, the script is put on a pending queue, which is evaluated in order when the py-runtime-ready event is dispatched. If the runtime is ready and the script's source becomes available afterwards, it's evaluated immediately. It's "just" simple coordination via events.

But I've encountered doubts about the <py-script> tag itself. At least, how it currently is.

Before exploring these doubts, it's important to acknowledge that the <py-script> tag is of fundamental importance to PyScript. It's the "101" first encounter folks have with PyScript and has two very important benefits:

BUT..!

I think @antocuni eloquently describes an outlook similar to my doubts. In summary:

Multiple scripts in the __main__ scope is a problem. Therefore (bear with me) we should only allow a single <py-script> tag on any page. Put simply, it is the equivalent of the main function... the single entry point.

Should you require other code, then do the Pythonic thing: stick it into a module, then reference it in the <py-config> so you can import my_code in the code contained in the single <py-script> tag (incidentally, this is why I'm working hard to get file-system support in MicroPython...).

But what about Python fragments in a web page? If you mean you want to scatter your code in different tags, I'd say, you're doing it wrong... scatter it in different modules and do what I suggest via <py-config>. Alternatively, if you want multiple Python fragments on a web page for "it's a notebook" type reasons, then we should do ourselves a favour and name things properly via an <py-notebook> tag that is an "or" with <py-script> (you can have one or the other, but not both on the same page).

To recap, only a single (main) <py-script> tag. You can't have <py-script> and <py-notebook> in the same web page.

How does <py-notebook> work..? Well, it's basically a REPL session with non-code prose interspersed (and, incidentally, if all you want is a REPL, just use <py-repl>):

<py-notebook>
  <h1>A header</h1>
  <p>Some prose</p>
  <code>
  # The <code> block is rendered with a "run" button, to evaluate the (content-editable=true)
  # content of the block.
  print("This output is automatically put into a div inserted as a sibling node immediately below this one.")
  foo = "bar"
  </code>
  ... etc...
  <p>Here's more arbitrary HTML</p>
  <code>
  # This code is still in the same scope as the previous <code> block.
  print(foo)
  </code>
</py-notebook>

In this case, the special, and completely understandable, aspect of running multiple <code> child nodes of a <py-notebook> tag is that they're all in the same scope (just like a notebook should be). Worth pointing out, just like <py-script> you can't have more than one <py-notebook> tag on a page.

These are raw suggestions, and they definitely need refining. However, they solve both the out-of-order execution problem along with the "it's all in __main__ scope" problem (except where this is explicitly a feature of the <py-notebook> tag).

Also, I realise these are "opinionated" solutions, but this is OK. I think we can all agree the reasons for such "opinionated" ways of working are good ones. For instance, there's a good reason Guido opined that Python should have functions and not GOTO. 😝

As always, I'm not precious about anything I write, and I'm interested to hear your thoughts and refine ideas so we get to the "good place" :tm:.

Thoughts...?

fpliger commented 1 year ago

Mmmm... @antocuni @ntoll I feel like we are all thinking about this at different levels and with different use cases in mind.

I honestly don't think the core of the issue itself is related to src or to the <py-script> but rather to the namespace. Let me comment on specific points

Tell me, isn't this very confusing behavior?

Yes but it's confusing but I don't think it's due to the <py-script > tags themselves but a misuse of them. Basically, they should have been loaded as modules and been accessed as

on_click = create_proxy(circle.on_click)
js.document.getElementById('btn-circle').addEventListener('click', circle.on_click)

and

on_click = create_proxy(rectangle.on_click)
js.document.getElementById('btn-circle').addEventListener('click', rectangle.on_click)

In 1, 2 or N <py-script> tags.

A PyScript tag without output can be used also to couple small pieces of logic with where results will be displayed. Yes, you can also do that by explicitly passing the output to render but using different pyscript tags makes it more explicit and my preference for certain users.

More in a bit...

I think @antocuni eloquently describes an outlook similar to my doubts. In summary:

How isolated should we run the code in a tag? Right now, and perhaps because the runtimes don't support any other way, it's all in the global main scope. This is not a good idea for all the reasons Antonio states. The order in which different tags are loaded, in both main PyScript and MicroPyScript, currently depends upon how fast we can extract the source. Inline code will run before anything fetched from the network, and anything fetched from the network will be dependent on the vagaries of latency and availability.

That's my thinking as well. The problem is related to scopes/namespaces and execution flow, not the src or the <py-script> tag itself.

Multiple scripts in the main scope is a problem. Therefore (bear with me) we should only allow a single tag on any page. Put simply, it is the equivalent of the main function... the single entry point.

Yup!

But what about Python fragments in a web page? If you mean you want to scatter your code in different tags, I'd say, you're doing it wrong... scatter it in different modules and do what I suggest via . Alternatively, if you want multiple Python fragments on a web page for "it's a notebook" type reasons, then we should do ourselves a favour and name things properly via an tag that is an "or" with (you can have one or the other, but not both on the same page).

To recap, only a single (main) tag. You can't have and in the same web page.

πŸ˜΅β€πŸ’« Ok, you lost me here. I am +1 on the first of your proposal about best practices [that we should promote and try to enforce as much as possible] and all but totally miss the rest. It's definitely not a secret that the vision for PyScript includes being able to run more than one runtime (and eventually different languages) on the same page. Making <py-script> a single instance is a non-solution.

I'd much rather introduce namespaces and define defaults and possibility for customizations that reduce the surface for confusion but at the same time allow users freedom if they know what they are doing.

In this case, the special, and completely understandable, aspect of running multiple child nodes of a tag is that they're all in the same scope (just like a notebook should be). Worth pointing out, just like you can't have more than one tag on a page.

Ok, let's imagine the scenario where a user has a Scientific dashboard with different simulations/data analysis/models in the same page and as part of their "exploration app" they want to add a "notebook" (which is basically a REPL or collection of REPLs) to allow their users to explore each single viz/model/dataset by allowing them to run snippets that can interact with the data. How should these users do that in a world where there can only be one?

Alt Text

ntoll commented 1 year ago

Ok, let's imagine the scenario where a user has a Scientific dashboard with different simulations/data analysis/models in the same page and as part of their "exploration app" they want to add a "notebook" (which is basically a REPL or collection of REPLs) to allow their users to explore each single viz/model/dataset by allowing them to run snippets that can interact with the data. How should these users do that in a world where there can only be one?

That's exactly what I was getting at with my (rough and ready - it needs refining) suggestion for a <py-notebook> tag. :-)

Great minds think alike...

I may try to bodge something together next week in MicroPyScript to show what I mean.

antocuni commented 1 year ago

I honestly don't think the core of the issue itself is related to src or to the <py-script> but rather to the namespace. Let me comment on specific points

The problems I'm complaining about are the result of a combination of src + global namespaces, because it means that if you look at a single file .py you cannot reason about it in isolation.

Let's see all possibilities:

  • shared global namespace + src: the current status
  • shared global namespace - src: this is better: each tag depends on the other but they are all in the same phycical HTML file, so it's easier to reason about
  • per-tag namespace + src: this solves the issue, but if we load a file and execute it in its own namespace, it's essentially a module. Then let's call it with its proper name
  • per-tag namespace - src: my preferred solution.

Tell me, isn't this very confusing behavior?

Yes but it's confusing but I don't think it's due to the <py-script > tags themselves but a misuse of them. Basically, they should have been loaded as modules and been accessed as

THANK YOU for explaining me the solution, I knew that πŸ˜…. But the resulting behavior is confusing, impossible to debug and it's against everything that people learn in python courses. We are giving people a gun and ask them no to shoot themselves in the foot.

A PyScript tag without output can be used also to couple small pieces of logic with where results will be displayed. Yes, you can also do that by explicitly passing the output to render but using different pyscript tags makes it more explicit and my preference for certain users.

Serious question: do we have any real use case in mind? I have tried to search a bit around for real apps that people have written, and they always use a single <py-script> (or multiple <py-script src=...>, but probably without realizing how dangerous it is).

Also, this model has a fundamental drawback:

  • the position of the <py-script> in the HTML depends on how you want to visualize stuff
  • but at the same time, the position of tags also influences the order of execution
  • the result is that if you want to change your UI, you might easily break the execution logic

I am bit skeptical that it's actually useful in practice, but I'm happy to be convinced otherwise. @JeffersGlass you have written a lot of examples, so I think that your experience is valuable here.

Ok, you lost me here. I am +1 on the first of your proposal about best practices [that we should promote and try to enforce as much as possible] and all but totally miss the rest. It's definitely not a secret that the vision for PyScript includes being able to run more than one runtime (and eventually different languages) on the same page. Making <py-script> a single instance is a non-solution.

this is a good point about having multiple <py-script> tags. But at the same time, it's also a big point on having each tag in its own namespace, because if you have different runtimes/languages you cannot share them "implicitly" as it's happening now.

JeffersGlass commented 1 year ago

@antocuni I should really set up an alert for the word 'namespace' in this Repo πŸ˜…

To be honest, I haven't been following this conversation closely to this point - I'll happily digest it and weigh in presently.

JeffersGlass commented 1 year ago

Two Paths to the Same End

I know this is well understood by this group and has already been said, but to spell it out because it will help make my point: there's ongoing confusion for end users, I think, because we have two ways of doing (essentially) the same thing (and because URLs and local file paths look so similar).

The src attribute as written loads content from a URL:

<!-- copy the contents of a file at the relative URL (and execute it in global namespace) -->
<py-script src="./foo.py"></py-script>

but using import relies on the module being present in the (Emscripten) file system, so we (currently) use paths: to get fetch() the file and dump it there:

<!-- copy the contents of a file at the relative URL "./foo.py" to a Emscripten local file called "./foo.py" -->
<py-config>
    paths=['./foo.py']
</py-config>

<!-- import the MODULE called foo (as found by Python's importers) -->
<py-script>
    import foo
</py-scipt>

Getting the files from the Network to the file system is what the \<py-config> (previously \<py-env>) paths list is meant to accomplish - fetching remote network resources and turning them into "local" files that Python's importers know what to do with. We've mused elsewhere that "paths" is probably not the clearest name for this feature. Perhaps fetch or web_resources_to_fetch_to_EM_filesystem or something.

In these two snippets, './foo.py` means something entirely different. And in a sense, we have two different ways to acheive (almost) the same result:

  • The first fetches a resource from the network resource immediately and executes its contents in the global namespace
  • The second fetches the resource to the local filesystem and allows importing it with Python's usual mechanisms

I think having both is unncessary, and so...

Let's Kill src = "..."

This is not at all what I thought when I started writing this, but I’ve been convinced - given the two options above, I think the second (paths+import) is better. Fetching the contents of a web resource, ignoring its file and running the code in the global namespace isn't really something that maps to Python in a clear way.

Personally, I've view src= as a convenience feature - "My code is too long for this spot in my HTML, let me move it elsewhere," since code from src functioned the same as inline code. The workflow I see for folks making things in PyScript (both folks in the Discord, and myself) is:

  • Start writing Python inline in a \<py-script> tag, since it's fast and easy (and fast and easy is a great feature)
  • Realize the code's getting longer and could be helped by code-completion/linting/not cluttering up the HTML file
  • Move it to its own '.py' file and add a 'src=" reference in the \<py-script> tag

Without src="...", the last step gets just a little longer - adding the file name to \<py-config> paths and replacing the inline code with import foo.

We could consider adding some kind of attribute to the \<py-script> tag itself to accomplish what paths does, but let me not get bogged down there. I want to get to multiple tags and namespacing:

Multiple \<py-script> Tags

If you mean you want to scatter your code in different tags, I'd say, you're doing it wrong...

For the sake of projects like The 7 Guis, the asyncio post, or the rich demos, I'd push back on the idea @ntoll that only one \<py-script> or \<py-notebook> tag be allowed on a page. Not that those are even particularly complex projects, as far as multi-compnent web apps go. The point is: locating of scripts near the place that they output to the page is really valuable when dealing with a larger document.

Even more so: allowing multiple \<py-script> tags on a page allows including \<py-script> tags into other components or templates. For example:

  • What if one wants to put a PyScript tag inside a react/vue/other component? These break if we're limited to a single script tag on the page.
  • I use a single Hugo shortcode (template) to both load Python code and run it in the browser, which I imagine Sphinx or mkdocs could do as well with some extension.
  • The rich demo creates a new <py-repl> tag for each demo, and destroys it when the next demo starts.

These ideas break if we are only allowed one script tag on the page

A PyScript tag without output can be used also to couple small pieces of logic with where results will be displayed. Yes, you can also do that by explicitly passing the output to render but using different pyscript tags makes it more explicit and my preference for certain users.

Do we have any real use case in mind? I have tried to search a bit around for real apps that people have written, and they always use a single \<py-script> (or multiple \, but probably without realizing how dangerous it is).

It me, the use case 😁. The Hugo Shortcode demo I linked above relies on \<py-script> tags outputting in place. When you're composing \<py-script> tags into components, in a way that you may not know if you can have a unique ID for outputting to a separate location, the ability to output in-place and locate tags at their output is key.

And yeah, src= is dangerous and it's easy to have variable name collisions. Hence, let's kill src="...". (I also wrote the original version of that Emoji Playground example you listed, back before I knew better... that way there be dragons.)

@ntoll Not having a single-entry-point for code is weird, I agree, and it would sure simplify things if we did. But it limits page composition, and the amount to which we can compose PyScript tags into other frameworks.

Additional Proposal: Tag Namespace Attribute

Allowing multiple tags to have the option to share a non-global namespace would be great - especially when they're outputing "in-place" via display(). Consider:

#<!-- Near the top of a component -->
<py-script namespace="math">
    # This is the key to this whole component
    def do_magic_math(x):
        return (3.14 * x) + (1/234987 + 2**x)
</py-script>
....
#<!- Near a dependent html component -->
<input id="my_input">
<py-script namespace = "math">
    from js import document
    from pyodide.ffi.wrappers import add_event_listener

    def magic_update(*args):
        val = do_magic_math(document.getElementByID("my_input"))
        display(f"<h1>{val}</h1>")

    add_event_listener(document.getElementById("my_input", "change", magic_update))
</py-script>
...
<input type="checkbox">
<py-script namespace=β€œmath”>
    # If do_magic_math() < 5 check the checkbox
</py-script>
... #etc

So, I'd propose to revive an idea from a previous discussion on Namespaces, and give \<py-script> an attribute (which only applies to inline code, since src= is dead) which executes the enclosed code in the named namespace. If such a name already exists in sys.modules it is executed in the global namespace of that module; if it doesn't, a module is created by that name, and the code is executed in a new dictionary for that namespace.

This brings us back to an issue we crashed into last time, which is needing/wanting the contents of pyscript.py and all its useful PyScript methods to be available in each of these namespaces. We considered adding it to builtins, but I believe where we landed is that the cleanest thing would be to wrap everything in that file into a proper module and allow users to import it (or import it for them @fpliger).

BUT! This additional namespacing for me is a nice to have, and an extension to the above. Anything you can do here, you can do with a little more effort by placing code in external files and importing it or parts of it, this is just a convenience/cleanliness feature, I think. So given it's want to reorganize pyscript.py, let's not wait on it.

fpliger commented 1 year ago

Thanks, @JeffersGlass for the thoughtful comment (as usual). I'm +1 on most of it so I'll comment on the things I'm not really aligned with.

Personally, I've view src= as a convenience feature - "My code is too long for this spot in my HTML, let me move it elsewhere," since code from src functioned the same as inline code. The workflow I see for folks making things in PyScript (both folks in the Discord, and myself) is:

  • Start writing Python inline in a tag, since it's fast and easy (and fast and easy is a great feature)
  • Realize the code's getting longer and could be helped by code-completion/linting/not cluttering up the HTML file
  • Move it to its own '.py' file and add a 'src=" reference in the tag

I don't think that captures the full essence of why one would use src. In addition to the above, I'd add:

  • testability: it's hard and ugly to test any code that is inlined. In fact, I'd say it's not really testable right now
  • readability: reading inline code (especially if long) is not a great experience. In most cases, editors don't have good support for it (syntax highlighting, autocomplete, etc..) but even with that support, it's not great to read code in between HTML tags

(I feel there's probably more though)

The idea, since the beginning, was that inline would be an entry point to get users to working code quickly but then support and encourage adding moving their code to src and external files.

And yeah, src= is dangerous and it's easy to have variable name collisions. Hence, let's kill src="...". (I also wrote the original version of that Emoji Playground example you listed, back before I knew better... that way there be dragons.)

I'm not sure I get the idea behind it being considered dangerous. Or, to better put, I think I do get why but think the problem is not src on its own but the fact that:

  1. We don't have the concept of namespaces so more than 1 <py-script> tag will effectively result in something comparable to splitting a main entrypoint file being split into multiple chunks
  2. The current execution flow has bugs that make the situation worse because most times we run things async and currently there's no guarantee on the order of execution of the <py-script> tags
  3. Because of 1 and 2, it's really easy to create collisions and silent errors that make it hard to debug and understand what's going on

Now, with that said....


> <!-- copy the contents of a file at the relative URL "./foo.py" to a Emscripten local file called "./foo.py" -->
> <py-config>
>     paths=['./foo.py']
> </py-config>
> 
> <!-- import the MODULE called foo (as found by Python's importers) -->
> <py-script>
>     import foo
> </py-scipt>
> 

This is not the solution... cannot be the solution. Really, it feels so unnatural and verbose. We went from 1 line to 6 with that horrible pattern of importing a module just to execute code in a namespace. It'd be (a bit) different if we are actually invoking something after an important but, even then, why? We are adding an anti-pattern just to solve for a (bunch of) bug(s).

My firm belief here is that we won't fix the problem by creating an awkward API but rather by fixing the issue at the root by fixing the execution flow (ensuring we can guarantee the order of execution), deciding if we need to put limitations on the number of py-script tags per page/namespace/etc.. and by supporting namespaces.

This brings us back to https://github.com/pyscript/pyscript/pull/503#issuecomment-1204435027, which is needing/wanting the contents of pyscript.py and all its useful PyScript methods to be available in each of these namespaces. We considered adding it to builtins, but I believe where we landed is that the cleanest thing would be to wrap everything in that file into a proper module and allow users to import it (or import it for them @fpliger).

Yes but, again, I think that supporting namespaces on py-script tags makes the API (and UX) slightly different than just just using Python modules as namespaces.

IIRC, some things might have changed enough since we have the namespaces discussion so that we might find convergence now. (One can hope :) )

BUT! This additional namespacing for me is a nice to have, and an extension to the above. Anything you can do here, you can do with a little more effort by placing code in external files and importing it or parts of it, this is just a convenience/cleanliness feature, I think. So given it's want to reorganize pyscript.py, let's not wait on it.

I both agree and disagree here... While it's true for "developer" users (because they have the tools to understand the difference and do more complex things) convenience/cleanliness is often what makes a Plaftorm/Framework succeed or fail compared to others (especially for non-expert users). (By saying this I'm not saying we should prioritize namespaces before everything else, but I think it's part of a success story)

antocuni commented 1 year ago

My firm belief here is that we won't fix the problem by creating an awkward API but rather by fixing the issue at the root by fixing the execution flow (ensuring we can guarantee the order of execution), deciding if we need to put limitations on the number of py-script tags per page/namespace/etc.. and by supporting namespaces.

the order of execution is not the core issue here. It's incidental and I had to mention it only to explain why the example was to awkward. Let's forget about it.

The real core issue is that multiple <py-script> tags share the same scope. You cannot have a shared scope and src without causing endless confusions and corner cases. I don't know if I have any veto power, but in case I do I will exercise it to forbid this solution at all costs.

You can have src if every <py-script> tag has its own isolated scope. I think this should be the solution, and you also get namespaces almost for free.

(Then, incidentally: the concept of "let's execute this .py file in its own scope" already exists in Python and it's called "module". But I'm fine to call it differently if you are really allergic to this name :man_shrugging: )

  • testability: it's hard and ugly to test any code that is inlined. In fact, I'd say it's not really testable right now

+1 for this, but note that in order to test them outside a pyscript app, you need to import them as modules. Another hint that probably they are modules ;)

fpliger commented 1 year ago

the order of execution is not the core issue here. It's incidental and I had to mention it only to explain why the example was to awkward. Let's forget about it.

Yes and no... not the core issue but it definitely exacerbates it. Let's agree to park it.

The real core issue is that multiple tags share the same scope. You cannot have a shared scope and src without causing endless confusions and corner cases. I don't know if I have any veto power, but in case I do I will exercise it to forbid this solution at all costs.

I'm glad you are proposing a compromising solution next because I think we'd be vetoing each other to death on this one lol

You can have src if every tag has its own isolated scope. I think this should be the solution, and you also get namespaces almost for free.

+1 on this. I think it's probably the only way we can converge. I'd also suggest that, if we all agree on this and start on this direction, we start simple and small (the feature is just that: Each one in their own namespace, namespaces can't access each other, 1 per tag, etc..), and add features on top of it as we go and find need for it.

(Then, incidentally: the concept of "let's execute this .py file in its own scope" already exists in Python and it's called "module". But I'm fine to call it differently if you are really allergic to this name πŸ€·β€β™‚οΈ )

We had this conversation before. I really believe this is not true (it is from a technical point of view but not from an UX one). It's like saying "everything is a dict in Python", while it's true and you can do a lot once you realize this, it's not something users need to know and use nor is the reason people love Python.

I see it more like different entry points (to different processes) that are part of the same application and [in some occasions] that may need to access each other/share data. They serve 2 different purposes. That's why I don't like the name, it's misleading, imho.

JeffersGlass commented 1 year ago

You can have src if every tag has its own isolated scope. I think this should be the solution, and you also get namespaces almost for free.

+1 on this. I think it's probably the only way we can converge. I'd also suggest that, if we all agree on this and start on this direction, we start simple and small (the feature is just that: Each one in their own namespace, namespaces can't access each other, 1 per tag, etc..), and add features on top of it as we go and find need for it.

+1 on this as well - as you say @fpliger - it's probably right that we give it a try and build on it as needed.

And as we noted in #503, there's likely to always be some way to access global scope in a pinch (my_var = js.pyscript.runtimes.globals.get('a_global_var') springs to mind, so truly daring devs who are willing to accept the risks can push the envelope if need be.

antocuni commented 1 year ago

+1 on this. I think it's probably the only way we can converge. I'd also suggest that, if we all agree on this and start on this direction, we start simple and small (the feature is just that: Each one in their own namespace, namespaces can't access each other, 1 per tag, etc..), and add features on top of it as we go and find need for it.

ok, works for me! To be 100% sure that we are on the same page, this is different than the original "namespace proposal" which we discussed long time ago in #503. The old proposal was "global namespace by default, opt-in for private namespace". Here we are going "private namespace by default, no way to access other namespaces". In other words:

<py-script>
x = 42
</py-script>

<py-script>
print(x)   # NameError
</py-script>

Did I understand correctly?

JeffersGlass commented 1 year ago

That's what I understand, we're proposing, yeah.

To be clear, I would still like a way for tags to ultimately share namespaces... but I think for the sake for being able to move forward, that should be a feature that gets added on once the bones of this proposal are in place. I'll say my piece and be done.

The use case for me personally, for tags to be able to share a namepace (whether they use src= or inline code) is explanatory blog posts about Python/Pyscript. They often are structured, at least in part, as follows: (The {{}} notation is shorthand for "load the convent of a file and display it as code; I use Hugo personally but it could be any templating system)

<p>Start by doing this thing:</p>
{{ content from step1.py }}
<py-script src="step1.py"><py-script>

<p>Now, do this next step</p>
{{ content from step2.py }}
<py-script src="step2.py"></py-script>

<p>And finally, do this</p>
{{ content from step3.py }}
<py-script src="step3.py"></py-script>

<p> As you can see, this process works</p>

My rich demo post works like this, as do parts of this post on JS object creation, and I have two in the works now, on FileIO and on developing package patches for Pyodide.

The ability for tags to share a namespace (currently, the global namespace, which I acknowledge is bad) allows for breaking the code down into files by chunks that make sense in terms of their intended us on the page. And yes, you cannot reason about them individually as Python files, but personally that matters less than the code as displayed and as run being identical because they reference the same source.

As I've said elsewhere, though, my use cases tend to be more "Using PyScript to talk about Python/PyScript" rather than "Use PyScript to do things on a website," so my usage may not be the target one.

antocuni commented 1 year ago

As I've said elsewhere, though, my use cases tend to be more "Using PyScript to talk about Python/PyScript" rather than "Use PyScript to do things on a website," so my usage may not be the target one.

I think you have a point here, and your case is probably perfectly valid: in this use case, you are basically mixing code and text into an unique flow, which is something which you cannot do normally but becomes very easy and natural in the context of a web page. I think we should fully support it.

What is the best way to support this use case, I don't know. One possibility, as you suggest, is to make it possible to share the same namespace across multiple py-script tags. Another is more similar to what @ntoll suggested earlier in this conversation, i.e. to have a <py-notebook> tag which does exactly that.

JeffersGlass commented 1 year ago

So, I don't know how far along you might be in implementing these ideas you are, and I don't want to get in the way of anything. But here's a thought.

What we had last converged on is that by default, each \<py-script> tag has its own scope/namespace, yes? I'm still onboard with that. So each time we hit a new \<py-script> tag, we'll need to create a new dict() to use as that namespaces globals, and 'initialize' it. (Right now, probably just run the contents of "pyscript.py" in that namespace; maybe later we import from a module).

Let me assume for a sec these namespaces have unique names, and that we store them in some kind of mapping, either on the TS side or the Python side: namespace_colletion = {'first': { global objects from first tag }, 'second': { global objects from second tag }} etc.

What about a attribute <py-script namespace="...">? Here's some pseudocode:

//pyodide.ts

//New function:
function runInNamespace(code:string, namespace_name:string){
    this.interpreter.runPython(sourcecode, {globals: namespace.namespace_name})
}

function run(code:string){
    if this.hasAttribute('namespace') and this.namespace in namespace_collection.keys(){
        //eval() code using existing namespace as globals
        this.runInNamespace(sourcecode, namespace_collection.namespace_name)
    }

    else { 
        //need to initialize a new namespace
        if this.hasAttribute('namespace'){
            //eval code using existing namespace
            new_namespace_name = this.getAttribute('namespace')
        }
        else {
            new_namespace_name = ??????
        }

        new_namespace_globals = this.runtime.get('dict')();
        namespace_collection[new_namespace_nname] = new_namespace_globals

        this.runInNamespace(pyscript as string, new_namespace_globals) 
        //and any other init steps, like runtime.run('set_version...')
        this.runtime.run(code, new_namespace_globals)
    }
}

If each tag is to have its own namespace by default, then "?????" is ... a GUID? Or maaaaybe something derived from src, but since src is a URL and not necessarily a file name I'm wary about that...

(If each tag were to default to the same namespace, "?????" would be "__main" or "__default_namespace" or some constant. But I think we've moved away from that.)

So the example above becomes, with extension:

<!--------- First Section --------->
<p>Start by doing this thing:</p>
{{ content from step1.py }}
<py-script src="step1.py" namespace="first_demo"><py-script>

<p>And now do this thing</p>
{{ content from step2.py }}
<py-script src="step2.py" namespace="first_demo"></py-script>

<!--------- Second Section --------->
<p>Let's look at something else now:</p>
{{ content from secondDemo1.py }}
<py-script src="secondDemo1.py" namespace="second_demo"></py-script>

<p>And with that something else, can also do this:</p>
{{ content from secondDemo2.py }}
<py-script src="secondDemo2.py" namespace="second_demo"></py-script>

Eh? As a place to start? I've made quite a few assumptions here, though, so this could be entirely off base.


As you've very helpfully illustrated before, creating a dict with a name and using it as a global dictionary for some code is basically reinventing the idea of modules. Honestly I'm neutral on whether we actually create modules using this logic or. If we do create them as modules, though, using module=... as an attribute could make some sense.

WebReflection commented 8 months ago

As there's no label in this and it's from 2022 ... I think we can close this.