static-frame / frame-fixtures

Use compact expressions to create diverse, deterministic DataFrame fixtures with StaticFrame
Other
8 stars 0 forks source link

Object dtype support #2

Open ForeverWintr opened 2 years ago

ForeverWintr commented 2 years ago

I'm seeing some inconsistencies when trying to create frames with object dtype:

ff.parse('v(object)|i(IH,(object,object,object))|c(IH,(object,object,object))|s(4,4)')
<Frame>
<IndexHierarchy>                None     None     None     None     <object>
                                zRKC     zRKC     zaji     zaji     <<U4>
                                -314.34  zDdR     zuVU     zKka     <object>
<IndexHierarchy>
None             zRKC  -314.34  96520    3776.36  True     194224
None             zRKC  zDdR     -88017   -1378.5  False    -2981.64
None             zaji  zuVU     92867    ztJh     105269   3565.34
None             zaji  zKka     3884.98  zQkB     119909   3770.2
<object>         <<U4> <object> <object> <object> <object> <object>

Note that although most arrays do end up with an object dtype, the second level in IH is U4.

flexatone commented 2 years ago

Thanks for posting this issue.

The cause is that the dtype is being evaluated in the narrow context of the values present, and it just happens that at this size that inner level can be represented as U4. If we increase the size to increase the diversity of values, the expected object dtype is found.

>>> ff.parse('v(object)|i(IH,(object,object,object))|c(IH,(object,object,object))|s(5,5)')                                                                                            
<Frame>
<IndexHierarchy>                   None     None     None     None     True     <object>
                                   zRKC     zRKC     zaji     zaji     172133   <object>
                                   -314.34  zDdR     zuVU     zKka     84967    <object>
<IndexHierarchy>
None             zRKC     -314.34  96520    3776.36  True     194224   -314.34
None             zRKC     zDdR     -88017   -1378.5  False    -2981.64 zDdR
None             zaji     zuVU     92867    ztJh     105269   3565.34  zuVU
None             zaji     zKka     3884.98  zQkB     119909   3770.2   zKka
True             172133   84967    -646.86  zvCj     194224   zMmd     84967
<object>         <object> <object> <object> <object> <object> <object> <object>

I see that this is undesirable, however, and will look into forcing the dtype to always matching the requested type.