Open klntsky opened 2 weeks ago
Hi @klntsky!
I'm thinking of implementing this with a new process
config option. Does this kind of structure match what you had in mind?
repomix.config.json
{
"output": {
// ... output config
}
"process": {
"maxLines": 100, // Default limit for all files
"patterns": [
{
"pattern": "**/*.json", // Special limits for JSON files
"maxLines": 20
}
]
}
}
The output would look like:
{
"users": [
{
"id": 1,
"name": "John"
}
]
... (truncated)
Let me know if this is heading in the right direction!
In some cases it may be useful to limit chars or words, not lines (e.g. unformatted json). Maybe all three should be configurable?
@klntsky If I'm understanding your intention correctly, I think the underlying issue here is that including entire file contents can consume a large number of tokens, which is a common problem for projects using repomix with LLMs.
Given this context and considering how LLMs process text, I think focusing on token count would be the most appropriate approach initially. Something like:
{
"process": {
"maxTokens": 1000, // Global token limit
"patterns": [
{
"pattern": "**/*.json",
"maxTokens": 500 // Pattern-specific token limit
}
]
}
}
I'd like to start with this simpler requirement to minimize potential bugs.
What do you think about this approach?
Yep, token limits seem to cover both cases, but I'd like to have lines too, because it's not immediately clear how many tokens are there in a part of the file, while lines can be inspected visually.
That makes sense. We could support both maxLines and maxTokens, truncating when either limit is reached.
Let me think about this a bit more.
The use case is: I have multiple JSON data files. I want to include them in the LLM input, but only to show their structure, not the contents. I'd like to be able to specify that I just want to include the first N lines.