mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
687 stars 82 forks source link

[FEAT] Search text in multiple Parquet files in one folder #72

Closed AFgh24 closed 1 year ago

AFgh24 commented 1 year ago

Hello Is this possible in the new version Include the ability to search for the same text in multiple Parquet files Currently, all files must be opened one by one to search for a text And it will take a lot of time The ability to search for a text in several Parquet files inside a folder can be really useful Now I have to open files one by one and wait for loading and searching and then the next file If you add this feature, it will be really great With respect

mukunku commented 1 year ago

Could you try the following to see if it will work for you?

  1. Using v2.6.0 go to File -> Open Folder
  2. Select your folder containing your parquet files
  3. Once loaded, increase Record Count to how ever much you need. (Search only runs on records that are loaded into memory)
  4. Perform your search in the query box
mukunku commented 1 year ago

Hey @AFgh24 Any luck with the latest beta release?

AFgh24 commented 1 year ago

Thank

I put two sample files in one folder

The first example file opens in the folder (no problem) https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet

The second example files are closed without being fully loaded (problematic) https://huggingface.co/docs/datasets-server/parquet https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/SelfRC/duorc-validation.parquet https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-validation.parquet

When both sample files are placed in the same folder, the following error is given

https://gcdnb.pbrd.co/images/I2sVBNG44bza.png

mukunku commented 1 year ago

Hey @AFgh24 ,

Thanks for the sample files. This error means the folder you are trying to open contains different parquet files. The utility cannot open multiple parquet files with different schemas. It only works if all the parquet files in the folder have the same schema, meaning they are truly partitioned. image

If your initial request was something like text search across a folder full of different parquet files, like grep search, that's not something I plan on adding any time soon.

AFgh24 commented 1 year ago

If your initial request was something like text search across a folder full of different parquet files, like grep search, that's not something I plan on adding any time soon.

Yes My request was this I hope you add this feature soon At the right time

mukunku commented 1 year ago

Going to close this issue out with the won't fix tag for now.