joke2k / faker

Faker is a Python package that generates fake data for you.
https://faker.readthedocs.io
MIT License
17.34k stars 1.9k forks source link

JSON from misc provider not working properly #2002

Closed angeldeejay closed 1 month ago

angeldeejay commented 4 months ago

Miscellaneous method json seems to not work properly with list of tuples values when there are nested elements when using data columns list format

Steps to reproduce

Run a python code like this

from faker import Factory, Generator

faker: Generator = Factory.create()

schema_to_test = [('root_key_a', 'word'),
 ('root_key_b', 'pyint'),
 ('root_key_c', 'pyfloat'),
 ('root_key_d', 'pybool'),
 ('nest_object_key',
  (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word'))),
 ('nest_simple_array_key', [((None, 'word'),), ((None, 'word'),)]),
 ('nest_object_array_key',
  [(('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word')),
   (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word')),
   (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word'))])]
faker.json(data_columns=schema_to_test, num_rows=1)

Expected behavior

Function should return something like this:

{
  "root_key_a": "center",
  "root_key_b": 410,
  "root_key_c": 81332856730.1496,
  "root_key_d": false,
  "nest_object_key": {
    "attr_a": 3188,
    "attr_b": false,
    "attr_c": "others"
  },
  "nest_simple_array_key": [
    "most",
    "training"
  ],
  "nest_object_array_key": [
    {
      "attr_a": 6525,
      "attr_b": false,
      "attr_c": "performance"
    },
    {
      "attr_a": 2330,
      "attr_b": true,
      "attr_c": "we"
    },
    {
      "attr_a": 1540,
      "attr_b": false,
      "attr_c": "range"
    }
  ]
}

Actual behavior

Fails abruptly with the a stack trace like this:

Traceback (most recent call last):
  File "my_test_file.py", line 15, in <module>
    faker.json(data_columns=schema_to_test, num_rows=1)
  File "<route to my python libs>/lib/python3.11/site-packages/faker/providers/misc/__init__.py", line 614, in json
    return json.dumps(create_json_structure(data_columns), indent=indent, cls=cls)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<route to my python libs>/lib/python3.11/site-packages/faker/providers/misc/__init__.py", line 611, in create_json_structure
    raise TypeError("Invalid data_columns type. Must be a dictionary or list")
TypeError: Invalid data_columns type. Must be a dictionary or list

I actually make it works by creating my own provider copying the code of the library and changing this nested function:

        def process_list_structure(data: Sequence[Any]) -> Any:
            entry: Dict[str, Any] = {}

            for name, definition, *arguments in data:
                kwargs = arguments[0] if arguments else {}

                if not isinstance(kwargs, dict):
                    raise TypeError("Invalid arguments type. Must be a dictionary")

                if name is None:
                    return self._value_format_selection(definition, **kwargs)
                if isinstance(definition, tuple):
                    entry[name] = process_list_structure(definition)
                elif isinstance(definition, (list, set)):
                    entry[name] = [
                        process_list_structure(item) for item in definition
                    ]
                else:
                    entry[name] = self._value_format_selection(
                        definition, **kwargs
                    )
            return entry

And changing my scheme to:

schema_to_test = (('root_key_a', 'word'),
 ('root_key_b', 'pyint'),
 ('root_key_c', 'pyfloat'),
 ('root_key_d', 'pybool'),
 ('nest_object_key',
  (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word'))),
 ('nest_simple_array_key', [((None, 'word'),), ((None, 'word'),)]),
 ('nest_object_array_key',
  [(('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word')),
   (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word')),
   (('attr_a', 'pyint'), ('attr_b', 'pybool'), ('attr_c', 'word'))]))
fcurella commented 4 months ago

Thank you for the report! Feel free to submit a Pull Request!

angeldeejay commented 3 months ago

When I'll be able to make a proper implementation, I will help.

LeonardoFurtado commented 1 month ago

When I'll be able to make a proper implementation, I will help.

I got the TypeError: Invalid data_columns type. Must be a dictionary or list only on the last scheme u posted, the one that is a tuple.

your first schema that u put on your first example give me ValueError: not enough values to unpack (expected at least 2, got 1) instead.

angeldeejay commented 1 month ago

@LeonardoFurtado Keep in mind that my original post is two months ago. Maybe we are not using exactly the same version of faker.

However, I will consider both problems when I'm able to develop a proper solution. Please feel free to open another issue and reference this, or leave here your entire trace to help you.

Thank you!

angeldeejay commented 1 month ago

I'm closing this due I'm unable to make a merge request for this