Closed seantiz closed 1 week ago
Some ways we're brought the edge cases within scope when working on the Typescript version:
We changed this to initalise a layerInfo object from the get go, which represents a unit of "tasks" which is analyseCodeTasks() job to shape and return.
We added the isUtilityVariable to our code task analyser.
tree.rootNode.text.match(/runtime|utils|helpers|shims|parser/i)
tree.rootNode.text.includes('type Token') || // Is Token a parser Type definitively in Typescript?
tree.rootNode.descendantsOfType('export_statement')
.some(node => node.text.match(/function\s+(get|is|has|create|parse|tokenize)/)) || // Added parse/tokenize
tree.rootNode.descendantsOfType('type_alias_declaration')
I feel this is still very shaken ground with this one! Needs review.
const isCoreModule = (
tree.rootNode.text.includes('extends APIResource') ||
tree.rootNode.text.includes('import { APIResource }') ||
tree.rootNode.text.match(/Messages|Streams|Resources/i) ||
tree.rootNode.descendantsOfType('export_statement')
.some(node => node.text.match(/class\s+(Message|Stream|Resource|Client)/))
);
We filter anything that is a utility module but not a core module and push new layerinfo params to them.
There's a chance that some part of our parsing logic can't handle potentially malformed code from .ts source files
When we were processing Anthropic SDK's TS library, we've seen 'tree-sitter' string getting passed as a value in a URL object. Still more tracing to do to understand where and why this is happening.
Block
href
:
<value unavailable>
raiseException
:
true
Local
this
:
URL
arguments
:
Arguments ['tree-sitter', callee: (...), Symbol(Symbol.iterator): ƒ]
base
:
undefined
input
:
"tree-sitter"
parseSymbol
:
undefined
There's a chance that some part of our parsing logic can't handle potentially malformed code from .ts source files
When we were processing Anthropic SDK's TS library, we've seen 'tree-sitter' string getting passed as a value in a URL object. Still more tracing to do to understand where and why this is happening.
Chrome DevTools
Block href : <value unavailable> raiseException : true Local this : URL arguments : Arguments ['tree-sitter', callee: (...), Symbol(Symbol.iterator): ƒ] base : undefined input : "tree-sitter" parseSymbol : undefined
I couldn't find any evidence of this being true in the end. Whenever I arbitrarily removed 120 LOC from core.ts
then tree-sitter parsed everything without explicit errors.
It seems there's a hard character limit on how much you can pass into tree-sitter parser, so we're now chunking larger files and their layer types are (as a compromise for now - though not ideal) being set to "unknown".
Reopened because we have a lot of modules with layer value "unknown" after refactoring in v0.0.7
i think we're getting unknown layer returns exclusively from the .dot file logic:
export function createDot(moduleMap: Map<string, DesignValues>) {
let dot = 'digraph Dependencies {\n';
dot += ' node [shape=box];\n';
// Track files with known and unknown layers
const unknownLayers = new Set<string>();
const knownNodes = new Set<string>();
// First pass - collect nodes with known layers
for (const [file, data] of moduleMap) {
const nodeName = path.basename(file);
const className = nodeName.replace('.h', '');
const layer = (() => {
const relationships = data.moduleRelationships;
if (!relationships || !relationships[className]) {
unknownLayers.add(nodeName);
return 'unknown';
}
return relationships[className].type || 'unknown';
})();
if (layer !== 'unknown') {
knownNodes.add(nodeName);
dot += ` "${nodeName}" [label="${nodeName}", layer="${layer}"];\n`;
}
}
which means the core problem is how we detect moduleRelationships.
modules are being marked as "unknown" when either:
The debug logs show we ARE capturing include relationships:
[DEBUG] lib/osx/poppler-0.66/include/poppler/Object.h includes Array.h [DEBUG] lib/osx/poppler-0.66/include/poppler/Object.h includes Dict.h [DEBUG] lib/osx/poppler-0.66/include/poppler/Object.h includes Stream.h
But these relationships aren't being translated into moduleRelationships in the DesignValues structure.
This is way beyond an edge case, we have to debug and the back this up with some bulk processing of C++ libraries.