nacholibre / node-readlines

Read large text files line by line (synchronous) for Node.js
MIT License
162 stars 28 forks source link

Refactored some of the code and added in extract lines performance #15

Open ekeric13 opened 6 years ago

ekeric13 commented 6 years ago

Looks like I am having some white space issue.

My text editor is saying spaces: 4.

What are you using?

edit: I think I resolved the whitespace issues.

Found this super useful btw! Not many good options for synchronously reading line by line while not all in memory.

Also if you don't want the dependency, we can check to see if Buffer.indexOf is supported, and if not use the old _extractLines function.

nacholibre commented 6 years ago

Hello,

thanks for your contribution, but I cannot merge this before I know for sure this is better than original version. How did you verified your code is faster? Can you post some test results?

ekeric13 commented 6 years ago

Makes total sense!

I just added a test that computes how long it currently takes to go through a ~27k line file.

It doesn't give you much context as it is not running against the previous version though.

I can upload my own benchmark file if you want, and for now will just post the results here.

I used the test helpers in this PR along with these functions to compare the old and new version of node-readlines.

const lineByLine = require('./lineByLine');
const oldLineByLine = require('n-readlines');

function percentDecrease(newNum, oldNum) {
  return Math.floor(((oldNum-newNum) / oldNum) * 100) + '%';
}

function logResultsFor(file, trials) {
  var newAvg = runThrough(file, lineByLine, trials);
  console.log(`new average ${file} x ${trials} in s:`, newAvg.toFixed(5));
  var oldAvg = runThrough(file, oldLineByLine, trials);
  console.log(`old average ${file} x ${trials} in s:`, oldAvg.toFixed(5));
  console.log(`time to parse through ${file} reduced by`, percentDecrease(newAvg, oldAvg));
}

logResultsFor('movies', 100);

logResultsFor('ratings', 3);

Where movies was 27k lines and ratings was 20m lines.

The results were

new average movies x 100 in s: 0.01400
old average movies x 100 in s: 0.03010
time to parse through movies reduced by 53%
new average ratings x 3 in s: 8.51500
old average ratings x 3 in s: 13.73300
time to parse through ratings reduced by 37%

Ran this multiple times, and the movies file was consistently ~50% better, and the ratings file was consistently ~35% better.