mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
687 stars 82 forks source link

use CSVHelper to export the data as csv #90

Closed tonyqus closed 9 months ago

tonyqus commented 9 months ago

To simplify the code of the csv export, I use CSVHelper instead.

mukunku commented 9 months ago

Hi @tonyqus ! Thank you so much for your contribution. I looked through the changes and it looks good to me however I noticed the addition of CSVHelper increases the executable size. It's not a big deal but the file size increases by 200KB when we include CSVHelper: image

Is there any other benefit to your change besides simplifying the code, like better efficiency? If not, maybe saving 200KB is better. What do you think?

tonyqus commented 9 months ago

Hi @mukunku,

I'm not sure why do care about the file size. Nowadays, the Internet speed is really good. In my view, it's not a problem for just 200k increase.

And referening CSVHelper is not just to simplify the code, but it's kind of seperation of duty. In ParquetViewer code, you should not handle details about how to write to a csv. Instead, ParquetViewer should only handles data part. And this can also help increase the readability of the code.

For writting performance, I'm not really sure about the performance of CSVHelper. But it's the most popular library of CSV in nuget. It ranks 57 in top nuget download packages

This is just my first PR for ParquetViewer. My plan is to help this project enhance the capability of export. I'll also involve NPOI soon to allow this project export not only xls but also xlsx. But if you have concern on file size, it can be a obstacle then

And I'm a big fans of Parquet file format since I used to be a big data architect for a while. I think the Parquet file format design is much better than Excel file. And it's popular in big data world. It's a bit annoying that a few developers are willing to use Excel file to save huge data and keep complaining some libraries like NPOI or EPPlus throws OOM. If Parquet can get popular in .NET world, this kind of issue can be easily solved.

mukunku commented 9 months ago

Hey @tonyqus

Parquet is definitely an underutilized format in the .NET world. I don't think people need to even use CSV anymore. Parquet is so much better. That's why I created this tool, to make parquet files more accessible to everybody, especially non-technical people.

Regarding your comment, file size is indeed important to me. Previously we only had csv export and another user added xlsx support with PR #54 . And I also mentioned there that the additional package was increasing the executable size significantly and that file size is important to me. I even went as far as to ditch xlsx and add a raw binary xls writer instead to keep the app lightweight.

Regarding using a library vs doing it yourself: I personally don't see the down side of writing the CSV directly in the app. I like avoiding adding package dependencies if I can to keep the project lean and light weight. If we needed to use some of the more fancier features of that library, it would be more justified in my mind.

Regarding adding xlsx support. Thanks so much for trying to make the app better. But like I mentioned above, it's not worth the additional package dependency in my opinion since we can write xls files. And anyone who wants xlsx they can simply Save As... the xls file in Excel.

Also, to share some analytics, xls seems to be ~20% of the exports. So seems people are preferring CSV more: image

I really appreciate your input and hope you don't mind if I reject this PR and keep the CSV logic the same.

tonyqus commented 9 months ago

Thank you for the explanation. You are the boss of this project. You definitely has the right to follow some principles defined by yourself. I can understand this.