innobi / pantab

Read/Write pandas DataFrames with Tableau Hyper Extracts
BSD 3-Clause "New" or "Revised" License
113 stars 44 forks source link

frames_from_hyper not pulling when there's no data in the table. #163

Closed VDFaller closed 1 year ago

VDFaller commented 2 years ago

Describe the bug I'm simply trying to pull all the frames from a hyper file (which I unfortunately can't give you). But I think the problem is that one of the tables doesn't have any rows. Which then errors out here with

ValueError: Length mismatch: Expected axis has 0 elements, new values have 60 elements

df in the line above is an empty dataframe.

This is a problem because it's screwing up my other reads.

Expected behavior It gives me a blank dataframe with the proper columns but no data.

Could skip it too, but that seems worse.

Desktop (please complete the following information):

VDFaller commented 2 years ago

I can fix it by adding

if df.empty:
    df = pd.DataFrame(columns=dtypes.keys())

right before it.

WillAyd commented 2 years ago

Thanks for the note. If you’d like to create a pull request with a test it would be very welcome

VDFaller commented 2 years ago

Already making the MR. I don't know how to use tableau so I don't know how to make a hyper , but I'll see if someone can help me over here.

WillAyd commented 2 years ago

I think you can still use pantab to write an empty dataframe? Or is that not working either?

VDFaller commented 2 years ago

Good point. I'll try that.

VDFaller commented 2 years ago

Seems that frame_to_hyper also doesn't work.

import pantab
import pathlib
from tableauhyperapi import TableName

datapath = pathlib.Path(__file__).parent / "data"
db_path = datapath / "zero_row.hyper"
df_expected = pd.DataFrame(columns = ['A'])

pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row'))

This seems to be in libpantab, and I'm not comfortable enough to touch the C.

fails with ``` MemoryError Traceback (most recent call last) f:\Work\Lumentum\Lumentum\pantab\pantab\tests\test_reader.py in () [129](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=128) db_path = datapath / "zero_row.hyper" [130](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=129) df_expected = pd.DataFrame(columns = ['A']) ----> [132](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=131) pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row')) File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:175, in frame_to_hyper(df, database, table, table_mode, hyper_process) [166](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=165) def frame_to_hyper( [167](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=166) df: pd.DataFrame, [168](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=167) database: Union[str, pathlib.Path], (...) [172](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=171) hyper_process: Optional[tab_api.HyperProcess] = None, [173](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=172) ) -> None: [174](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=173) """See api.rst for documentation""" --> [175](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=174) frames_to_hyper({table: df}, database, table_mode, hyper_process=hyper_process) File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:198, in frames_to_hyper(dict_of_frames, database, table_mode, hyper_process) [194](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=193) with tab_api.Connection( [195](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=194) hpe.endpoint, tmp_db, tab_api.CreateMode.CREATE_IF_NOT_EXISTS [196](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=195) ) as connection: [197](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=196) for table, df in dict_of_frames.items(): --> [198](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=197) _insert_frame( [199](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=198) df, connection=connection, table=table, table_mode=table_mode [200](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=199) ) [202](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=201) # In Python 3.9+ we can just pass the path object, but due to bpo 32689 [203](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=202) # and subsequent typeshed changes it is easier to just pass as str for now [204](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=203) shutil.move(str(tmp_db), database) File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:154, in _insert_frame(df, connection, table, table_mode) [152](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=151) with tab_api.Inserter(connection, table_def) as inserter: [153](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=152) if compat.PANDAS_130: --> [154](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=153) libpantab.write_to_hyper(df, null_mask, inserter._buffer, dtypes) [155](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=154) else: [156](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=155) libpantab.write_to_hyper_legacy( [157](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=156) df.itertuples(index=False, name=None), [158](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=157) null_mask, (...) [161](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=160) dtypes, [162](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=161) ) MemoryError: ```
WillAyd commented 2 years ago

That is unfortunate. Well if you are feeling ambitious here's a guide I wrote in pandas for how to debug their C extensions. The same rules should apply here in pantab

https://pandas.pydata.org/docs/development/debugging_extensions.html

WillAyd commented 2 years ago

Alternately we could also just return early if the data frame is empty during write, not even invoking the tableau inserter.

Definitely an untested case here with reading/writing empty frames. Would make for a good scenario to put into test_roundtrip.py

VDFaller commented 2 years ago

Don't know if I have that kind of ambition at the moment. But would it be helpful if I made a test_roundtrip.py test for it? maybe with a @pytest.mark.skip(reason="currently failing #163") ?

WillAyd commented 2 years ago

I think the test there should work for reading/writing. I know the OP was just about reading but looks like neither work with an empty frame. Would make sense to fix togethee