lexanth / python-ast

Python (3) Parser for JavaScript/TypeScript (based on antlr4ts)
MIT License
14 stars 5 forks source link

How do I get all the tokens for a statement? #4

Open hamirmahal opened 1 year ago

hamirmahal commented 1 year ago

I can't do node.text.split(' '). "Since tokens on hidden channels (e.g. whitespace or comments) are not added to the parse trees, they will not appear in the output of this method." So, I get fromdatetimeimportdatetime,date when traversing from datetime import datetime, date, for example.

I'd want to do something like node.getTokens(), but getTokens(x) takes a ttype as a parameter that I haven't seen documentation for.

I tried just plugging in incremental integers for ttype, but that didn't give me any tokens.

node text fromdatetimeimportdatetime,date
for ttype -1 getTokens returns []
for ttype 0 getTokens returns []
for ttype 1 getTokens returns []
for ttype 2 getTokens returns []
for ttype 3 getTokens returns []
for ttype 4 getTokens returns []
for ttype 5 getTokens returns []
for ttype 6 getTokens returns []
for ttype 7 getTokens returns []
for ttype 8 getTokens returns []
for ttype 9 getTokens returns []
for ttype 10 getTokens returns []
for ttype 11 getTokens returns []
for ttype 12 getTokens returns []
for ttype 13 getTokens returns []
for ttype 14 getTokens returns []
for ttype 15 getTokens returns []
for ttype 16 getTokens returns []
for ttype 17 getTokens returns []
for ttype 18 getTokens returns []
for ttype 19 getTokens returns []
for ttype 20 getTokens returns []
for ttype 21 getTokens returns []
for ttype 22 getTokens returns []
for ttype 23 getTokens returns []
for ttype 24 getTokens returns []
for ttype 25 getTokens returns []
for ttype 26 getTokens returns []
for ttype 27 getTokens returns []
for ttype 28 getTokens returns []
...
lexanth commented 1 year ago

I've not used the getTokens method really, but it looks like it doesn't do deep searching, so may not be what you're after. The constants for token types are on Python3Parser though - e.g. Python3Parser.STRING.

To actually investigate that sort of piece of code, I'd look at using the walk or visitor APIs to pick the type of element you're looking for first - e.g.

const code = `from datetime import datetime, date
`;
const ast = parse(code);

const visitor = createVisitor({
  visitImport_from(importNode) {
    const importSourceNode = importNode.dotted_name();
    const importedItems = importNode.import_as_names()?.import_as_name();

    console.log(importSourceNode?.text);
    console.log(importedItems?.map((item) => item.text));
  },
});
visitor.visit(ast);

const imports = [];
walk(
  {
    enterImport_from(importNode) {
      const importSourceNode = importNode.dotted_name();
      const importedItems = importNode.import_as_names()?.import_as_name();

      imports.push({
        source: importSourceNode?.text,
        items: importedItems.map((item) => item.text),
      });
    },
  },
  ast
);
console.log(imports);

This snippet still doesn't handle the full complexity of python imports, so e.g. renamed imports would need something different, but this might set you on the right path, depending on what you're trying to do.

hamirmahal commented 1 year ago

That snippet definitely helps. Thanks for posting it.

Do you by any chance have any suggestions on how to go about extracting a renamed import from import datetime as dt, for example?

This is what I tried.

import { createVisitor, parse, walk } from 'python-ast';

interface Import {
  items: string[];
  source: string;
}

const analyze = (code: string) => {
  const ast = parse(code);
  const visitor = createVisitor({
    visitImport_name(importNode) {
      const importSourceNode = importNode.dotted_as_names();

      console.log('visitImport_name', importSourceNode?.text);
    },
    visitImport_stmt(importNode) {
      const importSourceNode = importNode.import_name();
      importSourceNode
        ?.dotted_as_names()
        .children?.forEach((child) => console.log('child', child.text));
      importSourceNode
        ?.dotted_as_names()
        .children?.forEach((child) => console.log('child', child.text));

      console.log('visitImport_stmt', importSourceNode?.text);
      console.log(
        'visitImport_stmt',
        importSourceNode
          ?.dotted_as_names()
          .dotted_as_name()
          .map((i) => i.text)
      );
    }
  });
  visitor.visit(ast);

  const imports: Import[] = [];
  walk(
    {
      enterImport_from(importNode) {
        const importSourceNode = importNode.dotted_name();
        const importedItems = importNode.import_as_names()?.import_as_name();

        imports.push({
          source: importSourceNode?.text || '',
          items: importedItems?.map((item) => item.text) || []
        });
      }
    },
    ast
  );
  return imports;
};

analyze('import datetime as dt\n');

This was the output.

child datetimeasdt
child datetimeasdt
visitImport_stmt importdatetimeasdt
visitImport_stmt [ 'datetimeasdt' ]