selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

Cannot export more than 60 rows to xpt #70

Closed bolrDK closed 2 years ago

bolrDK commented 2 years ago

Python version: 3.8.6 xport version: 3.2.1

When trying to xport more than 60 rows to xport, I get an error message 'NotImplementedError: Can't copy SAS variable metadata to dataframe'. It's independant of how many variables I include - and whether I use the xport.v56.dump or xport.from_columns functions.

I have afterwards installed xport v2.0.2 and can use xport.from_columns functions any number of input rows that I need with no problems.

I have created a small test script to illustrate the problem - see the console output from executions with 60 and 61 input rows respectively below:

df1 = pd.DataFrame({'COL': ['01','02','03','04','05','06','07','08','09','10', '11','12','13','14','15','16','17','18','19','20', '21','22','23','24','25','26','27','28','29','30', '31','32','33','34','35','36','37','38','39','40', '41','42','43','44','45','46','47','48','49','50', '51','52','53','54','55','56','57','58','59','60']}) ds1 =xport.Dataset(df1, name='test1') with open('c:/temp/test1.xpt', 'wb') as f: xport.v56.dump(xport.Library({'test1': ds1}),f) c:\users\bolr\programs\python38\lib\site-packages\xport\v56.py:630: UserWarning: Converting column dtypes {'COL': 'string'} warnings.warn(f'Converting column dtypes {conversions}') Converting column 'COL' from object to string

df2 = pd.DataFrame({'COL': ['01','02','03','04','05','06','07','08','09','10', '11','12','13','14','15','16','17','18','19','20', '21','22','23','24','25','26','27','28','29','30', '31','32','33','34','35','36','37','38','39','40', '41','42','43','44','45','46','47','48','49','50', '51','52','53','54','55','56','57','58','59','60', '61']}) ds2 =xport.Dataset(df2, name='test2') with open('c:/temp/test2.xpt', 'wb') as f: xport.v56.dump(xport.Library({'test2': ds2}),f) Traceback (most recent call last):

File "", line 10, in xport.v56.dump(xport.Library({'test2': ds2}),f)

File "c:\users\bolr\programs\python38\lib\site-packages\xport\v56.py", line 932, in dump fp.write(dumps(library))

File "c:\users\bolr\programs\python38\lib\site-packages\xport\v56.py", line 951, in dumps return bytes(Library(library))

File "c:\users\bolr\programs\python38\lib\site-packages\xport\v56.py", line 727, in bytes b'members': b''.join(bytes(Member(member)) for member in self.values()),

File "c:\users\bolr\programs\python38\lib\site-packages\xport\v56.py", line 727, in b'members': b''.join(bytes(Member(member)) for member in self.values()),

File "c:\users\bolr\programs\python38\lib\site-packages\xport__init.py", line 470, in init__ self.copy_metadata(data)

File "c:\users\bolr\programs\python38\lib\site-packages\xport__init__.py", line 412, in copy_metadata for k, v in self.items():

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\frame.py", line 957, in items yield k, self._get_item_cache(k)

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\generic.py", line 3539, in _get_item_cache res = self._box_col_values(values, loc)

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\frame.py", line 3187, in _box_col_values return klass(values, index=self.index, name=name, fastpath=True)

File "c:\users\bolr\programs\python38\lib\site-packages\xport__init.py", line 310, in init__ LOG.debug(f'Initialized {self}')

File "c:\users\bolr\programs\python38\lib\site-packages\xport__init.py", line 276, in repr return f'{type(self).name}\n{super().repr__()}\n{", ".join(metadata)}'

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\series.py", line 1315, in repr self.to_string(

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\series.py", line 1374, in to_string formatter = fmt.SeriesFormatter(

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\io\formats\format.py", line 261, in init self._chk_truncate()

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\io\formats\format.py", line 285, in _chk_truncate series = concat((series.iloc[:row_num], series.iloc[-row_num:]))

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\reshape\concat.py", line 274, in concat op = _Concatenator(

File "c:\users\bolr\programs\python38\lib\site-packages\pandas\core\reshape\concat.py", line 395, in init axis = sample._constructor_expanddim._get_axis_number(axis)

File "c:\users\bolr\programs\python38\lib\site-packages\xport__init__.py", line 340, in _constructor_expanddim raise NotImplementedError("Can't copy SAS variable metadata to dataframe")

NotImplementedError: Can't copy SAS variable metadata to dataframe

selik commented 2 years ago

I think this might be fixed by #64 and is meged into master. Try using the latest version from GitHub.

bolrDK commented 2 years ago

Thanks - that helped. Now I only wonder why the xport.v56.dump function gives all the conversion warnings like:

warnings.warn(f'Converting column dtypes {conversions}') Converting column 'STUDYID' from string to string

even though I have changed the object type to string for all relevant colums in the dataframe.

selik commented 2 years ago

Not a clue. If you figure it out, let me know!