pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.47k stars 1.97k forks source link

Refactor convert_observed_data #7299

Closed lhelleckes closed 1 month ago

lhelleckes commented 1 month ago

Description

In order to improve type hints in the convert_observed_data function and to ultimately resolve issue #7277, the generator part of the code was separated in a statement with return. This will make it easier to apply dtypes to the other data structures in the next step.

Related Issue

Checklist

Type of change


📚 Documentation preview 📚: https://pymc--7299.org.readthedocs.build/en/7299/

welcome[bot] commented 1 month ago

Thank You Banner] :sparkling_heart: Thanks for opening this pull request! :sparkling_heart: The PyMC community really appreciates your time and effort to contribute to the project. Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

michaelosthege commented 1 month ago

Good news is that all tests except test_skewstudentt_logp worked.

Flaky tests are annoying because we can't XFAIL them (I forgot).

michaelosthege commented 1 month ago

@lhelleckes you can rebase now :)

aseyboldt commented 1 month ago

Couldn't the observed value be supposed to be an integer?

michaelosthege commented 1 month ago

Couldn't the observed value be supposed to be an integer?

Yes, in that case it will result in a NumPy array too:

>>> type(pm.floatX(5))
<class 'numpy.ndarray'>
aseyboldt commented 1 month ago

I meant that it might be an array (pytensor or numpy) with integer dtype. Unless I'm missing some context we can't just convert that to a float type.

michaelosthege commented 1 month ago

I meant that it might be an array (pytensor or numpy) with integer dtype. Unless I'm missing some context we can't just convert that to a float type.

If you follow the branching, you'll find that we made that conversion all the time already.

In my opinion we should merge this and continue adding a dtype kwarg to fix #7277.


The whole "preparing generator data for VI" should be refactored. I would probably even give it it's own pm.GeneratorData container. This is not the scope of this PR though.

If y'all agree I can take the first step towards putting generator data into a pm.GeneratorData container. (to_graphviz style, deprecation warning, ...)

aseyboldt commented 1 month ago

If you follow the branching, you'll find that we made that conversion all the time already.

Sorry, I don't know what you mean. Can you point me to an example? I don't think we are converting data that a users specified as an int type to a float type automatically, do we?

michaelosthege commented 1 month ago

If you follow the branching, you'll find that we made that conversion all the time already.

Sorry, I don't know what you mean. Can you point me to an example? I don't think we are converting data that a users specified as an int type to a float type automatically, do we?

main branch: https://github.com/pymc-devs/pymc/blob/fd11cf012895a9981351097df420b7fbbfb693a4/pymc/pytensorf.py#L119-L133

When data is a generator, the if isgenerator(data) is the first case that evaluates True. Then if hasattr(data, "dtype") → False and return floatX(ret).

If we should do that is a separate, VI-specific question which is IMO best dealt with by separating the generator case away.

welcome[bot] commented 1 month ago

Congratulations Banner] Congrats on merging your first pull request! :tada: We here at PyMC are proud of you! :sparkling_heart: Thank you so much for your contribution :gift: