interesting :) - Githubissues

leeoniya commented 1 week ago

hey @kossnocorp!

i'm always on the lookout for new CSV parsers!

if you want to bench smolcsv against some others, i have quite a few here: https://github.com/leeoniya/uDSV/tree/main/bench

let me know if you'd like me to do a run :)

kossnocorp commented 2 days ago

Hey! This is amazing! I am definitely interested. Could you please point me to where I can read how I can get started so I can check if it passes the tests/runs?

leeoniya commented 1 day ago

so, in terms of correctness, i don't have anything that tests all libs. but you can copy/adapt parse.spec.mjs, which should give you a pretty solid foundation for RFC 4180 compliance. but there is also a bunch of uDSV-specific stuff there that might not be applicable to this project. uDSV doesnt support malformed CSVs, so there are no tests for things like missing quotes, mixed line endings, etc.

for the benchmark runner, there are a few steps.

first, i do not benchmark all libs against all datasets, in all scenarios (untyped, typed, streaming). that would probably take a while and melt my laptop :). i usually just run one dataset at a time whenever i'm curious. basically just uncomment the csv file paths that end up in the dataPaths array. additionally, i comment/uncomment what parsers/scenarios i want to bench in parserPaths array.

the datasets are not in the repo, but can be downloaded from the locations listed in the bench README.

to add a new lib:

add it to bench/package.json and install.
create a file in the appropriate scenario directory that implements the bench interface, like https://github.com/leeoniya/uDSV/blob/main/bench/non-streaming/untyped/PapaParse.cjs
add the file path to the appropriate parsers array in bench/runall.cjs
in the root of the project do bun run ./bench/runall.cjs or node ./bench/runall.cjs

if you did all of that just right :sweat_smile: , you'll see the bench runner work its way down the parser list and output the perf chart as those in the bench README.

easy peasy!

kossnocorp / smolcsv

interesting :) #1