mathause / filefinder

find and parse file and folder names
MIT License
3 stars 1 forks source link

show duplicates on non-unique query #59

Closed mathause closed 8 months ago

mathause commented 1 year ago

If non-unique metadata is encountered filefinder throws an error:

import filefinder

ff = filefinder.FileFinder("", "{num}", test_paths=["1", "1"])
ff.find_files()
File ~/code/filefinder/filefinder/_filefinder.py:397, in FileFinder.find_files(self, keys, _allow_empty, **keys_kwargs)
    356 def find_files(self, keys=None, _allow_empty=False, **keys_kwargs):
    357     """find files in the file system using the file pattern
    358
    359     Parameters
   (...)
    395
    396     """
--> 397     return self.full.find(keys, _allow_empty=_allow_empty, **keys_kwargs)

File ~/code/filefinder/filefinder/_filefinder.py:120, in _Finder.find(self, keys, _allow_empty, **keys_kwargs)
    118 msg = "This query leads to non-unique metadata. Please adjust your query."
    119 if len_all != len_unique:
--> 120     raise ValueError(msg)
    122 return fc

ValueError: This query leads to non-unique metadata. Please adjust your query.

It could be helpful if the error message would include the entries with non-unique metadata (difficulty: what if there are a lot of them?)

https://github.com/mathause/filefinder/blob/32e1210064bdb30e70b6a3eb66b27bf52a7e8115/filefinder/_filefinder.py#L120

mathause commented 1 year ago

Its a pandas DataFrame so we can e.g. show .head()

mathause commented 8 months ago

Something along the lines:

import pandas as pd

d = {'col1': [1, 2, 1], 'col2': [3, 4, 3]}
df = pd.DataFrame(data=d)

duplicated = df[df.duplicated()]

msg = f"This query leads to non-unique metadata. Please adjust your query.\n{duplicated.head()}"

raise ValueError(msg)