Open mattday opened 3 years ago
Sorry for taking so long to respond, but this is a great suggestion. CSVReader
didn't always have a templated stream constructor. Previously, it really only supported reading from std::string
(and std::ifstream
by copying data to an internal std::string
--very inefficient).
I essentially had to rewrite it so it could support processing parsing data from std::stringstream
, memory mapped files, and std::ifstream
without too much duplication of code. Supporting reading CSVs directly from compressed files was also a motivation for this.
This was accomplished by creating the IBasicCSVParser
interface, of which the StreamParser
generic class derives from.
If you want to play around, you can see if StreamParser
can be generalized to std::istream
. If not, you can always create your own IBasicCSVParser
implementation to work with a specific underlying data type. Currently, the library uses StreamParser
for std::istream-derived types and MmapParser
for memory-mapped IO.
Hi Vincent, I would like to echo Matt and thank you for your excellent library. I was curious about how you were able to read from a gzipped file? Unfortunately, using an istream in the constructor of the CSVReader does not work since the move constructor is protected for istreams. Any help would be greatly appreciated
Hi Vincent, I have played around, trying to adapt the StreamParser to work with std::istream. The move part is easy: initialize a member reference rather than moving the stream. And works when the actual stream is file. But I'm afraid to face a more fundamental problem now.
A generic stream input should also support pipes, such that you can consume CVS data from a decompression filter or similar source. But in case of pipes std::istream::tellg will not report anything useful other than pos_type(-1), and StreamParser::next will not be able to determine base class IBasicCSVParser::source_size upfront, but result in zero. It immediately "declares" EOF and terminates, even without catching the error of tellg.
Is the logic of IBasicCSVParser intended to work with implementations, where data would need to be processed before knowing the final source size?
First, thanks for an excellent library. It seems to be the best in class.
CSVReader
has a templated stream constructor where the template parameter can be derived from astd::istream
. As the documentation suggests, this works well withstd::stringstream
andstd::ifstream
. However, it can't be used withstd::istream
itself, or certain other derived streams. I think this is because it tries to move the stream into theStreamParser
, so the code doesn't compile if the stream doesn't support this.It would be incredibly useful to be able to use the parser with more generic streams, giving users the ability to read from compressed files. For example, this might be via the
gzip_decompressor
or thebzip2_decompressor
inboost::iostreams
. Reading from a gzip compressed CSV file is trivial in languages like Python. It's also a common requirement given how inefficient it can be to store a lot of data in a CSV file.It seems as though the parser can almost support this already, so it probably doesn't need significant changes.