Closed muescha closed 9 months ago
It looks like the _lazy_splitlines
function (https://github.com/kellyjonbrazil/jc/blob/dev/jc/utils.py#L397) is skipping blank lines. I'll need to find a way around that.
I added a small fix to _lazy_splitlines
and it seems to work better. I'll put this in dev
so you can test or you can modify the two lines manually:
def _lazy_splitlines(text: str) -> Iterable[str]:
NEWLINES_PATTERN: str = r'(\r\n|\r|\n)'
NEWLINES_RE = re.compile(NEWLINES_PATTERN)
start = 0
for m in NEWLINES_RE.finditer(text):
begin, end = m.span()
if begin != start:
yield text[start:begin]
else: # add this line
yield '' # add this line
start = end
if text[start:]:
yield text[start:]
It works as expected.
I am new with Python Array Slicing™ - it is confusing for me. Zero-based and excluding the end index.
I expected writing `"5:10" or "4:9" but I need to write "4:10"
bat test.out | jc "4:10" --kv -p
{
"a": "b",
"c": "d",
"e": "f",
"g": "h",
"i": "k"
}
But that is ok, when it is normal with python. Maybe this behaviour should be documented in doc and on command line help for users not familiar with the python slicing?
is this right in this case - here I would expect 100
?
Line Slicing:
- $ cat file.csv | jc :101 --csv # parse first 100 lines
+ $ cat file.csv | jc :100 --csv # parse first 100 lines
also this help is then confusing:
data: (string or iterable) - input to slice by lines
- slice_start: (int) - starting line
+ slice_start: (int) - starting line (zero based)
- slice_end: (int) - ending line
+ slice_end: (int) - ending line (+1)
or it is better to use 1 based and the ending line (as I original expected) with the "5:10"
?
Starting Line: 5 until
Ending Line: 10
Here are some explanations as to why Python slicing works the way it does. There is an elegance factor that maybe only a programmer would see.
https://stackoverflow.com/questions/11364533/why-are-slice-and-range-upper-bound-exclusive
The slicing behavior is documented in the readme and man page but we can probably add more.
In the csv example in help I use :101 to account for the zero start and the header row.
In the csv example in help I use :101 to account for the zero start and the header row.
The current example is a bit confusing. I recommend using a clearer illustration. Understanding the slicing index here requires knowing that the header row is not counted.
Consider this alternative:
$ cat output.txt | jc 4:15 --parser # Parse from line 4 to 14 with parser (zero-based)
Additionally, it might be helpful to include an explanation of the SLICE
option in the jc --help
command:
Slice:
[start]:[end]
start: [[-]index] - Zero-based start line, negative index for counting from the end
end: [[-]index] - Zero-based end line (excluding the index), negative index for counting from the end
Maybe this provides a clearer and more detailed explanation.
Agreed - definitely room for improvement. I can add these doc updates.
Added in v1.25.0
Where
I can set the split function as parameter in
jc
like"3:10"
I expect to cut it with
"5:10"
and then processed byjc
:ok - maybe it is zero based... then
But it looks like the split also not count the empty lines: