PeterJCLaw commented 2 years ago

Thanks for this library, I've found it really useful for building source manipulation tools.

I'm currently building a flake8 plugin and it would be great if there was a way to build an ASTTokens instance from an existing list of (stdlib) tokens. This would complement the ability to pass in an existing AST tree and allow cases which already have both available to avoid re-parsing the source code. (Flake8 makes both an existing AST and tokens list available to plugins).

If this is something you'd be open to I'd be happy to put together a PR. (The change seems simple enough, though possibly I'm missing something)

alexmojaki commented 2 years ago

Sounds sensible. You should probably have a brief look at the first few changes to asttokens.py in #93 as they will be relevant to this. I imagine the ultimate solution will simply be converting the variable original_tokens in _generate_tokens to an optional parameter of _generate_tokens, init_tokens, and __init__.

PeterJCLaw commented 2 years ago

I imagine the ultimate solution will simply be converting the variable original_tokens in _generate_tokens to an optional parameter of _generate_tokens, init_tokens, and __init__.

Yup, that's pretty much what I had in mind.

You should probably have a brief look at the first few changes to asttokens.py in #93 as they will be relevant to this.

Ah, yeah, that's going to overlap a bit and I suspect that allowing all the various construction cases is going to be a bit complicated too.

For my understanding: what use is the ASTTokens class if it doesn't have an AST to work with (i.e: both tree=None and parse=False)? Are there consumers who use it just for the token utils? (I've found them useful, though wouldn't have thought to use this library just for them).

I realise this is perhaps a bit beyond the scope of this issue and I'm perhaps late to the conversation (and a breaking API change), but I wonder whether it would make sense to have ASTTokens always build up the aspects (tokens/ast) that it doesn't have rather than trying to accommodate the cases of partial data?

That would allow the ASTTokens class to be simpler -- a lot of the asserts that PR adds would go away and consumers wouldn't need to type-guard access to ASTTokens.tree.

If the token-only or ast-only functionality is useful on its own then perhaps extracting them to standalone functions or base classes would make sense? That could leave the current ASTTokens class pulling all the features together for those that want it, but allow subsets to be used on their own without complicating the code.

If you'd rather stick with the current design I'm happy to work with that, I just didn't want to end up creating a lot of complications for #93.

In the meanwhile I've put up a minimal PR for just the change to allow passing in a list of tokens: https://github.com/gristlabs/asttokens/pull/95. Happy to rebase that on #93 if you'd prefer that.

alexmojaki commented 2 years ago

For my understanding: what use is the ASTTokens class if it doesn't have an AST to work with

@dsagal I'm also curious about this.

I wonder whether it would make sense to have ASTTokens always build up the aspects (tokens/ast) that it doesn't have rather than trying to accommodate the cases of partial data?

93 is about intentionally not tokenizing at all for performance when possible. See #92 and the linked IPython issue.

If the token-only or ast-only functionality is useful on its own then perhaps extracting them to standalone functions or base classes would make sense? That could leave the current ASTTokens class pulling all the features together for those that want it, but allow subsets to be used on their own without complicating the code.

The AST-only functionality mostly works, but if you ask for the source text range of a node that it doesn't support, it automatically initialises tokens so that it can use them. So splitting up the class sounds like a challenge.

dsagal commented 2 years ago

For my understanding: what use is the ASTTokens class if it doesn't have an AST to work with

@dsagal I'm also curious about this.

I don't have any compelling use cases for when both tree=None and parse=False. All practical cases I know set one or the other.

alexmojaki commented 2 years ago

So there's no reason this test is written like this?

https://github.com/gristlabs/asttokens/blob/8c5efa16a2bf1a43eda4f284d6ee6709b847f6ad/tests/test_asttokens.py#L121

dsagal commented 2 years ago

Well, the reason is that ASTTokens supports a mode where it only deals with tokens, so tokenization-specific unittests use that mode. But I am not aware of practical uses of ASTTokens that rely on this mode.

gristlabs / asttokens

Support for constructing around a list of existing tokens #94

93 is about intentionally not tokenizing at all for performance when possible. See #92 and the linked IPython issue.