Open madeindjs opened 10 months ago
Hi @madeindjs Thanks for the contribution. I'm wondering if it slows down a lot when using large files as input, have you had any cases like that?
So the thing is I tried with a large diff (25Mo) and the performances are still fine. I just tried to experimente some stuff to improve the performances: The execution time was identical. But it surely improve memory consumption. It's might also be cool if user wants to iterate on each changes without storing them in memory.
It's up to you to decide if this PR is good to have or not.
BTW, I tried to go further and use AsyncGenerator
when reading the file (here) like this
async function* getGenerator() {
const stream = createInterface({ input: createReadStream(FILE) });
for await (const line of stream) yield line;
}
for await (const change of parseGitDiffAsync(getGenerator())) console.log(change);
...but my conslusion was the execution time become worse (aroud 2 times slower, probably because it requires that I add async
everywhere). But it can still be intersting for large file which doesn't fit in memory.
But it surely improve memory consumption. It's might also be cool if user wants to iterate on each changes without storing them in memory.
@madeindjs Agree! Thanks for the explanation. What do you think about the supporting streams instead of using generators? After seeing the above example, I realized that it would be more usable if we support streams.
Yes I agree, I'll update the PR with the support of stream instead of generator.
My only concern is I duplicated parse-git-diff.ts
as parse-git-diff-async.ts
to be able to introduce AsyncGenerator
. I didn't find a way to factorize things because everything need to be async
.
@yeonjuan , I added the support of Interface
from Node.js readline
package. It allow a sexy syntax like:
import { createInterface } from 'node:readline';
import { createReadStream } from 'node:fs';
import parseGitDiffAsync from './parse-git-diff-async.js';
const FILE = '/home/alexandre/Downloads/v3_2_0...v3_3_0.diff';
const stream = createInterface({ input: createReadStream(FILE) });
for await (const change of parseGitDiffAsync(stream)) i++;
The cool thing is readline
might allow user to introduce CLI tools which allow pipe operator from the shell like git diff | script
Regarding performances , I noted thoses performances using my 25mb diff
method | time |
---|---|
parseGitDiff(diff: string) |
0.30s |
parseGitDiff(diff: Generator) |
0.25s |
parseGitDiffAsync(diff: AsyncGenerator) |
0.47s |
I let you take a look at the PR and comment. As I said, I'm not that happy of the code because I copy/pasted many stuff. There is surely a way to factorize it better.
Hi @madeindjs Thanks for the work! I agree that the code needs to be cleaned up, and I'm thinking about separating the public interfaces. In the end, I'd like to have something like this.
import parseGitDiff, { parseGitDiffs, parseGitDiffsGenerator, parseGitDiffsStream } from 'parse-git-diff';
parseGitDiff; // deprecated
parseGitDiffs; // synchronized version
parseGitDiffsGenerator; // for supporting generator
parseGitDiffsStream; // for supporting stream
And it would be nice to change the output format for consistency.
{
type:"GitDiff",
files: [...]
}
[
{ diff }, { diff }, { diff }
]
This might take a little longer. What do you think about what it would be like if I worked on this PR base?
Yes I agree that the ouput needs to be changed, but I didn't do it since it's a breaking change for the API.
I personally prefer the overides way because it looks better in term of developper experience. But it's your librairy, do as you like you prefer, don't worry :wink:
Of course, I'm fine that you work on this PR.
It might improve the performances to use a
Generator
when reading big files.I just added an overide function which allow this