jimbaker / tagstr

This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings
51 stars 6 forks source link

Overhaul the tutorial #32

Closed pauleveritt closed 3 months ago

pauleveritt commented 1 year ago

A place to collaborate with @patrick91 for thoughts about re-thinking the tutorial.

  1. More like a website. This repo has been somewhat like a code-repo. But there's going to be an important mission of education. We should think less of the tutorial as "supporting the PEP process" and more like "a bunch of tutorials which teach how to use tag strings."

  2. Examples -> tutorials. The examples files in src are currently a bit of a garbage-barge. They exist to show various ideas. We should make this first class by turning them into tutorials which explain the ideas in the code.

  3. Split up tutorial.rst. The main tutorial is BIG. It's multiple tutorials: lazy, SQL, and an enormous HTML templating tutorial. Let's make these into separate tutorials.

  4. Simpler starting tutorial. In the PEP work I did a couple of months ago, I introduced a much-simpler "greeting" example. I propose we do that example as the official "start here" tutorial. It can have more narrative than the PEP, multiple steps, and it could cover each tag string concept in the mega-HTML-tutorial using a MUCH simpler sample app.

  5. Tests. The tutorials can be written in a way where the RST includes working sample code in files which have tests that run in GitHub Actions. I think this is really important during this phase, to make sure what we are teaching people actually matches any changes in implementation.

jimbaker commented 1 year ago

Agreed wholeheartedly with everything above! I think the earlier work has helped us figure out some of the ergonomics of this change, so that at least we personally are satisfied it would work in real code. Now it would be nice to do it for other people so they could learn how to use it, as tag users and as tag developers.

Separately, there is still work needed to be done on formalizing our thinking on Chunk and Thunk types such that they work really work for those ergonomics, in particular with structural pattern matching. So there's still some role in working out these examples/tutorials to ensure this works in practice.

benji-york commented 1 year ago

I've been following this project and seeing this discussion made me think that my Manuel project would be a good fit for testing the documentation, so I started playing around with that idea.

I started by adding code-block annotations to the tutorial...

diff --git tutorial.rst tutorial.rst
index 6b63371..f95183b 100644
--- tutorial.rst
+++ tutorial.rst
@@ -22,11 +22,15 @@ Tutorial
 Tag strings start with the functionality in f-strings, as described in PEP 498.
 Let's take a look first at a simple example with f-strings::

+.. code-block: python
+
     name = 'Bobby'
     s = f"Hello, {name}, it's great to meet you!"

 The above code is the equivalent of writing this code::

+.. code-block: python
+
     name = 'Bobby'
     s = 'Hello, ' + format(name, '') + ", it's great to meet you!"

@@ -43,6 +47,8 @@ But consider this shell example. You want to use ``subprocess.run``, but for
 your scenario you would like to use the full power of the shell, including pipes
 and subprocesses. This means you have to use ``use_shell=True``::

+.. code-block: python
+
     from subprocess import run

     path = 'some/path/to/data'

After adding a few, I ran Manuel—which (among other things) runs the code in the blocks. It generated this output:

======================================================================
ERROR: /Users/benji/Dropbox/Benji/projects/tagstr/tutorial.rst
/Users/benji/Dropbox/Benji/projects/tagstr/tutorial.rst
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/benji/Dropbox/Benji/projects/tagstr/ve/lib/python3.11/site-packages/manuel/testing.py", line 41, in runTest
    self.regions.evaluate_with(self.manuel, self.globs)
  File "/Users/benji/Dropbox/Benji/projects/tagstr/ve/lib/python3.11/site-packages/manuel/__init__.py", line 130, in evaluate_with
    evaluater(region, self, globs)
  File "/Users/benji/Dropbox/Benji/projects/tagstr/ve/lib/python3.11/site-packages/manuel/codeblock.py", line 32, in execute_code_block
    exec(region.parsed.code, globs)
  File "/Users/benji/Dropbox/Benji/projects/tagstr/tutorial.rst:50", line 6, in <module>
  File "/usr/local/Cellar/python@3.11/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Popen.__init__() got an unexpected keyword argument 'use_shell'

Which led to this fix:

--- tutorial.rst
+++ tutorial.rst
@@ -46,7 +46,7 @@ and subprocesses. This means you have to use ``use_shell=True``::
     from subprocess import run

     path = 'some/path/to/data'
-    print(run(f'ls -ls {path} | (echo "First 5 results from ls:"; head -5)', use_shell=True))
+    print(run(f'ls -ls {path} | (echo "First 5 results from ls:"; head -5)', shell=True))

 However, this code as written is broken on any untrusted input. In other words,
 we have a shell injection attack, or from xkcd, a Bobby Tables problem::

So, that seems like a pretty good proof-of-concept. Adding support for Manuel was easy and it found a bug in the documentation almost immediately. Here's the full change set in PR form: https://github.com/benji-york/tagstr/pull/1/files

Does this seem interesting for the documentation testing? I would be happy to help if so.

jimbaker commented 1 year ago

@benji-york sounds good to me!

Finding bugs in docs with automated tools makes sense. (And a nice immediate example of Manuel working to find such bugs!) Note that we are using code-block: python in the draft PEP; a significant fraction are used to highlight Python code demonstrating the tag string itself. So we would need tag string support for whatever pipeline runs Manuel.

I guess the easiest way to do that would be to have a Docker image for Python from a tag string branch. It could be generally useful, and it should be easy enough to create.

pauleveritt commented 1 year ago

Oh neat, a chance to collaborate with @benji-york ... if you're willing to join in, I'll design the writing to make it testable. My plan to date was going to be:

Still, it's a lot more pleasurable than working on code that's only in a RST/MD code block.

benji-york commented 1 year ago

Oh neat, a chance to collaborate with @benji-york

: )

if you're willing to join in, I'll design the writing to make it testable

The biggest constraint is going to be format. Manuel (via doctest) only really supports ReST. (I've been pondering MyST support and a contributor has started on that work.)

My plan to date was going to be:

  • Have the code on disk and use literal include

Manuel doesn't currently support running code inside a literalinclude but it wouldn't be hard to add. There might need to be two flavors, one that executes the code and one that ignores it. Or maybe three flavors: ignore, run, parse-so-as-to-catch-syntax-errors. Amongst the many flavors...

pauleveritt commented 1 year ago

@benji-york I opened a ticket to discuss switching docs from RST to MD but ain't got no engagement. So RST it is for now. 😀

For literalinclude that approach is what I would do if we weren't doing Manuel. If you participated, I'd work less hard on little snippets on disk such as htmlnode01a.py, obscure start-after, and likely some crappy CSS margin-left: -0.5em to remove Python indentation.

I propose that I write the freaking content, the dumb way, with files-on-disk. Then, when you're interested, you do a PR that inlines things?

benji-york commented 1 year ago

@benji-york I opened a ticket to discuss switching docs from RST to MD but ain't got no engagement. So RST it is for now. 😀

+1

I propose that I write the freaking content, the dumb way, with files-on-disk. Then, when you're interested, you do a PR that inlines things?

That's fine with me.

Alternatively: what if we spent an hour pairing over a video call writing the first part of one of these documents with Manuel as-is so that we get a feeling of how painful the code-in-ReST workflow would be for you and to see if there are any unknown-unknowns that are great or awful with the approach?

pauleveritt commented 11 months ago

@benji-york At long last, I'm back to adding in Manuel.

The docs/pep.rst file has this from Python 3.12, which manuel/codeblock.py hates:

  .. code-block:: python

      @dataclass
      class Language:
          linguist: str  # standard language name/alias known to GitHub's Linguist
          cooked: bool = True

      type HTML = Annotated[T, 'language': 'HTML', 'registry': 'linguist']

It's type that is the problem:

document = <manuel.Document object at 0x103d19d70>

    def find_code_blocks(document):
        for region in document.find_regions(CODEBLOCK_START, CODEBLOCK_END):
            start_end = CODEBLOCK_START.search(region.source).end()
            source = textwrap.dedent(region.source[start_end:])
            source_location = '%s:%d' % (document.location, region.lineno)
>           code = compile(source, source_location, 'exec', 0, True)
E             File "/Users/pauleveritt/projects/pauleveritt/tagstr/docs/pep.rst:456", line 8
E               type HTML = Annotated[T, 'language': 'HTML', 'registry': 'linguist']
E                    ^^^^
E           SyntaxError: invalid syntax

I'll comment that out for a sec and continue.

pauleveritt commented 11 months ago

@benji-york Also:

  .. code-block:: python

      def memoization_key(*args: str | Thunk) -> tuple[str...]:
          return tuple(arg for arg in args if isinstance(arg, str))

...leads to:

document = <manuel.Document object at 0x103454560>

    def find_code_blocks(document):
        for region in document.find_regions(CODEBLOCK_START, CODEBLOCK_END):
            start_end = CODEBLOCK_START.search(region.source).end()
            source = textwrap.dedent(region.source[start_end:])
            source_location = '%s:%d' % (document.location, region.lineno)
>           code = compile(source, source_location, 'exec', 0, True)
E             File "/Users/pauleveritt/projects/pauleveritt/tagstr/docs/pep.rst:553", line 3
E               def memoization_key(*args: str | Thunk) -> tuple[str...]:
E                                                                                                    ^
E           SyntaxError: expected ':'

The indicator is on the tuple[ part.

pauleveritt commented 11 months ago

Some others after that. If you have any luck with the PEP, can you let me know?

jimbaker commented 11 months ago

It would be nice to support PEP 695's type statements. In general this PEP work has been trying to follow the most up-to-date PEPs, in not yet final released Python, so this might be a challenge with any tooling.

pauleveritt commented 3 months ago

As mentioned in #40 we are stripping down this repo. The tutorial is being worked on as part of a "site" that is on a different timing than the PEP, so closing this.