retrieve filelist incl. categories

chappi commented 4 months ago

Looks up the categories and matches them with the file list. The result is a pandas DataFrame with the columns: 'id', 'catId', 'filename', 'path', 'created' and 'lastUpdated' .

Fixes #5

chappi commented 4 months ago

Note: the function has been named 'filepath_list' as the to-be-extended function 'file_list' is used in a test. Didn't know what to do, the name can be modified of course.

lasuk commented 4 months ago

@chappi: FYI, I've rebased branch hps/extend-filelist onto the latest main branch (after merging #6) and force-pushed it.

chappi commented 4 months ago

[FYI] I manually deleted the two autotest_img.jpg files on the server. Now the test succeeds again. Don't know why the autotest files remained (shouldn't happen, we will see... 😃).

lasuk commented 4 months ago

Thanks for the merge request. The recursive functionality to flatten the category tree is very neat.

I propose to make the function CashCtrlAPIClient._cat_list() more generic such that it can retrieve the category tree for files or any other other CashCtrl object with a category tree.

lasuk commented 4 months ago

I propose to make the function CashCtrlAPIClient._cat_list() more generic such that it can retrieve the category tree for files or any other other CashCtrl object with a category tree.

Here's a sketch for a generic function, based on your implementation:

    def list_categories(self, object: str, system: bool=False) -> pd.DataFrame:
        """
        Params:
        - object (str): a CashCtrl object with a category tree, e.g. 'file', etc.
        - system (bool): if True, return system nodes. Otherwise silently drop system nodes.
        """
        def flatten_data(nodes, parent_path=''):
            if not isinstance(nodes, list):
                raise ValueError(f"Expecting `nodes' to be a list, not {type(nodes)}.")
            rows = []
            for node in nodes:
                path = f"{parent_path}/{node['text']}"
                if ('data' in node) and (not node['data'] is None):
                    data = node.pop('data')
                    rows.extend(flatten_data(data, path))
                rows.append({'path': path} | node)
            return rows

        data = self.get(f"{object}/category/tree.json")['data']
        df = pd.DataFrame(flatten_data(data.copy()))
        if not system:
            df = df.loc[~df['isSystem'], :]
        return df.sort_values('path')

chappi commented 4 months ago

Comments incorporated. I'm not sure about the test, could you have a look at 'test_filepath_list.py' please? Thanks.

lasuk commented 4 months ago

Thanks a lot. Looks good to me. The new list_category() method is much more generic. Please implement above change requests regarding the docstings.

Tests are only temporary and will be replaced once our 'mirror' function is implemented. I see no reason to further look into them at this time.

macxred / cashctrl_api

retrieve filelist incl. categories #7