temporalio / sdk-python

Temporal Python SDK
MIT License
454 stars 67 forks source link

[Bug] _convert_payloads fails with One or more mappers failed to initialize #646

Open rakesh-163 opened 1 week ago

rakesh-163 commented 1 week ago

What are you really trying to do?

I have an activity that calls a function that performs database operations. The first operation in that activity is a read. The SQLModel that I am trying to read is called a "Story".

Describe the bug

The activity runs fine... until it starts to fail and cause failures for the workflow that calls it... It says something about the workflow accessing os.environ.get (See stack trace below) but the function that calls does not have that. So, it is either that the failure should not occur or it may be that it is telling the wrong reason why it occurs. In either case, it is a bug.

Also, I have scoured the library (i.e. SQLAlchemy) module for signs of an os.environ.get, I did not find any.

I have also passed through all external library imports at this point...

Any help would be appreciated!

Minimal Reproduction

It is hard to reproduce because a lot of the times the activity just succeeds. It is usually after an hour or so, this particular activity starts to fail.

Environment/Versions

Additional context

Here's the full stack trace that I see when the failure occurs:

{"message":"Failed decoding arguments","stackTrace":" File \"/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 326, in activate\n self._apply(job)\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 422, in _apply\n self._apply_resolve_activity(job.resolve_activity)\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 654, in _apply_resolve_activity\n ret_vals = self._convert_payloads(\n ^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 1563, in _convert_payloads\n raise RuntimeError(\"Failed decoding arguments\") from err\n","cause":{"message":"One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'Mapper[Story(story)]'. Original exception was: Cannot access os.environ.get from inside a workflow. If this is code from a module not used in a workflow or known to only be used deterministically from a workflow, mark the import as pass through.","stackTrace":" File \"/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py\", line 1555, in _convert_payloads\n return self._payload_converter.from_payloads(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/converter.py\", line 307, in from_payloads\n values.append(converter.from_payload(payload, type_hint))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/converter.py\", line 583, in from_payload\n obj = value_to_type(type_hint, obj, self._custom_type_converters)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/temporalio/converter.py\", line 1533, in value_to_type\n return getattr(hint, \"parse_obj\")(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/typing_extensions.py\", line 2853, in wrapper\n return arg(*args, *kwargs)\n ^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/sqlmodel/main.py\", line 951, in parse_obj\n return cls.model_validate(obj, update=update)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/sqlmodel/main.py\", line 848, in model_validate\n return sqlmodel_validate(\n ^^^^^^^^^^^^^^^^^^\n\n File \"/usr/local/lib/python3.12/site-packages/sqlmodel/_compat.py\", line 311, in sqlmodel_validate\n new_obj = cls()\n ^^^^^\n\n File \"\", line 4, in init\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/state.py\", line 566, in _initialize_instance\n manager.dispatch.init(self, args, kwargs)\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/event/attr.py\", line 497, in call\n fn(args, **kw)\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py\", line 4396, in _event_on_init\n instrumenting_mapper._check_configure()\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py\", line 2388, in _check_configure\n _configure_registries({self.registry}, cascade=True)\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py\", line 4204, in _configure_registries\n _do_configure_registries(registries, cascade)\n\n File \"/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py\", line 4241, in _do_configure_registries\n raise e\n","applicationFailureInfo":{"type":"InvalidRequestError"}},"applicationFailureInfo":{"type":"RuntimeError"}}


Also, I am using this sandboxed runner on the workers to help deal with the datetime issue that the Pydantic models pose. I am not sure if this interacts with the converter bits in the stack trace above.

def new_sandbox_runner() -> SandboxedWorkflowRunner:

TODO(cretz): Use with_child_unrestricted when https://github.com/temporalio/sdk-python/issues/254

# is fixed and released
invalid_module_member_children = dict(
    SandboxRestrictions.invalid_module_members_default.children
)
del invalid_module_member_children["datetime"]
return SandboxedWorkflowRunner(
    restrictions=dataclasses.replace(
        SandboxRestrictions.default,
        invalid_module_members=dataclasses.replace(
            SandboxRestrictions.invalid_module_members_default,
            children=invalid_module_member_children,
        ),
    )
)
cretz commented 3 days ago

Something in sqlalchemy is calling os.environ but they choose to swallow that stack trace and wrap with "One or more mappers failed to initialize" so you can't see where. I would recommend not using sqlalchemy ORM objects inside a workflow, but instead have simpler dataclass objects you translate to sqlalchemy equivalents in activities as needed.

rakesh-163 commented 2 days ago

Hey Chad, Thanks for the reply. Appreciate that you looked inside the sqlalchemy codebase. Could you point me to the line of code that is doing the os.environ call? I could not find it when I grepped for it.

cretz commented 1 day ago

Could you point me to the line of code that is doing the os.environ call? I could not find it when I grepped for it.

It may be nested in something else and not directly called. To debug, first you'd need to patch sqlalchemy to not swallow the true stack trace of why a mapper fails to initialize. Probably around https://github.com/sqlalchemy/sqlalchemy/blob/rel_2_0_35/lib/sqlalchemy/orm/mapper.py#L4232-L4252. You need to true stack trace of that exception. Regardless, I would not recommend using sqlalchemy models in workflows because they are likely non-deterministic.