When looking for the list of files having the requested file_format, this code is not optimal because fsspec will explore all the files in the parent folder. It can even explore other unwanted files.
ex:
if we want to find all the files ending with .npy in /tmp/tmpeejv3hoh
fs.glob("/tmp/tmpeejv3hoh*/.npy") will explore all the files in /tmp and could even match wrong files like /tmp/tmpeejv3hoh_2/toto.npy
When looking for the list of files having the requested file_format, this code is not optimal because fsspec will explore all the files in the parent folder. It can even explore other unwanted files.
glob_pattern = path.rstrip("/") + f"**/*.{file_format}"
ex: if we want to find all the files ending with .npy in /tmp/tmpeejv3hoh fs.glob("/tmp/tmpeejv3hoh*/.npy") will explore all the files in /tmp and could even match wrong files like /tmp/tmpeejv3hoh_2/toto.npy
https://github.com/rom1504/embedding-reader/blob/11d237d2b0ac95423b0477dac438e17d3e05b689/embedding_reader/get_file_list.py#L45