CSVLoader upgrade - Githubissues

The CSVLoader is in many ways the bread and butter of table loading and needs to be really solid.

The current CSVLoader is based on a fork of papaparse which is very complex to maintain due to its dated and convoluted code style.

Make a cleanup pass on papaparse code (typescript etc)
Fork another, cleaner CSV loader
Find another CSV loader that supports async iterator model and import it as a dependency rather than fork it, so that we benefit from open source fixes (see streaming issue below)
Write a custom CSV loader (ideally using state machine parser we use in JSONLoader etc - however state machine approach is likely complicated due to the "fluid" nature of CSV syntax.

We do want support for streaming parsing.
When we initially surveyed the landscape in mid-2019, existing open source csv loaders that do support streaming usually did so from Node streams (push model).
However, the loaders.gl parseInBatches architecture is AsyncIterator based (pull model) - which is arguably more composable, modern and also aligns with Apache Arrow.
Converting node streams to AsyncIterator is fairly complex and typically involves forking and modifying the code, so we would still end up with a fork, unless we can upstream the AsyncIterator changes.
There is a branch that adds a generic stream to AsyncIterator adapter - this could be useful but ran into subtle issues and would likely require careful testing before landing.

We do want a performant parser. Not clear which parser is fastest. Got some indications that papaparse is not competitive.
We also ran into issues with large datasets. Substring operation in Chrome retains the original string leading to excessive memory consumption.

visgl / loaders.gl