wiseio / paratext

A library for reading text files over multiple cores.
Apache License 2.0
1.06k stars 103 forks source link

Issue with paratext.load_csv_to_pandas() #77

Open ericxyun opened 5 years ago

ericxyun commented 5 years ago

Thank you in advanced for your time.

I'm trying to load a csv file with:

data = paratext.load_csv_to_pandas('data.csv')

I'm getting a:

AttributeError: module 'ntpath' has no attribute 'splitunc'

I am able to load the csv file with the traditional method using pd.read_csv().

Full Error Output:

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py:403: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  return pandas.DataFrame.from_items(expanded)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-de2c6a8a93be> in <module>()
----> 1 data = paratext.load_csv_to_pandas('2016.csv')

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py in load_csv_to_pandas(filename, *args, **kwargs)
    401               return pandas.DataFrame()
    402     else:
--> 403          return pandas.DataFrame.from_items(expanded)
    404 
    405 @_docstring_parameter(_csv_load_params_doc)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in from_items(cls, items, columns, orient)
   1458                       FutureWarning, stacklevel=2)
   1459 
-> 1460         keys, values = lzip(*items)
   1461 
   1462         if orient == 'columns':

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py in load_csv_to_expanded_columns(filename, *args, **kwargs)
    353         return pandas.DataFrame.from_items(filename, *args, **kwargs)
    354     """
--> 355     for name, col, semantics, levels in load_raw_csv(filename, *args, **kwargs):
    356         if levels is not None and len(levels) > 0:
    357             yield name, levels[col]

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py in load_raw_csv(filename, *args, **kwargs)
    296 
    297     """
--> 298     loader = internal_create_csv_loader(filename, *args, **kwargs)
    299     return internal_csv_loader_transfer(loader, forget=True)
    300 

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py in internal_create_csv_loader(filename, num_threads, allow_quoted_newlines, block_size, number_only, no_header, max_level_name_length, max_levels, cat_names, text_names, num_names, in_encoding, out_encoding, convert_null_to_space)
    186     if out_encoding == "utf-8":
    187         loader.set_out_encoding(pti.UNICODE_UTF8)
--> 188     loader.load(_make_posix_filename(filename), params)
    189     return loader
    190 

C:\ProgramData\Anaconda3\lib\site-packages\paratext\core.py in _make_posix_filename(fn_or_uri)
    118 
    119 def _make_posix_filename(fn_or_uri):
--> 120      if ntpath.splitdrive(fn_or_uri)[0] or ntpath.splitunc(fn_or_uri)[0]:
    121          result = fn_or_uri
    122      else:

AttributeError: module 'ntpath' has no attribute 'splitunc'

Thank you again for your time.

Redevil10 commented 5 years ago

I am having the same error: AttributeError: module 'ntpath' has no attribute 'splitunc'

I am running this on Ubuntu

arestifo commented 5 years ago

I'm also getting the same error. I'm using MacOS 10.14.3

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/restifo/PycharmProjects/bulk-rnaseq/util.py", line 5, in <module>
    df = paratext.load_csv_to_pandas("./data/TcgaTargetGtex_RSEM_Hugo_norm_count", num_threads=7)
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/paratext/core.py", line 403, in load_csv_to_pandas
    return pandas.DataFrame.from_items(expanded)
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/pandas/core/frame.py", line 1782, in from_items
    keys, values = lzip(*items)
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/paratext/core.py", line 355, in load_csv_to_expanded_columns
    for name, col, semantics, levels in load_raw_csv(filename, *args, **kwargs):
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/paratext/core.py", line 298, in load_raw_csv
    loader = internal_create_csv_loader(filename, *args, **kwargs)
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/paratext/core.py", line 188, in internal_create_csv_loader
    loader.load(_make_posix_filename(filename), params)
  File "/Users/restifo/anaconda3/envs/bulk-rnaseq/lib/python3.7/site-packages/paratext/core.py", line 120, in _make_posix_filename
    if ntpath.splitdrive(fn_or_uri)[0] or ntpath.splitunc(fn_or_uri)[0]:
AttributeError: module 'ntpath' has no attribute 'splitunc'

Python version:

> print(sys.version)
3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 15:43:19) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
JamesFinlayson-zz commented 5 years ago

I'm having the same error with the same setup as @a-re . I'm also having the error when using load_csv_to_dict

cmoscardi commented 4 years ago

Notes to self after fixing this. Or if anyone else wants to make a PR off this, feel free!

(is this library still maintained by anyone?)

  1. splitunc is deprecated. need to use splitdrive instead. source: https://github.com/aaronryank/-/blob/master/workspace/MSYS2_64/usr/lib/python3.4/ntpath.py

  2. Also need to change pd.DataFrame.from_items to pd.DataFrame.from_dict