wiseio / paratext

A library for reading text files over multiple cores.
Apache License 2.0
1.06k stars 103 forks source link

Reading from stdin should use single thread or report an error #62

Open ehiggs opened 7 years ago

ehiggs commented 7 years ago

When using multiple threads to read from stdin, paratext fails. It works fine with actual files or with a single thread reading stdin.

The code I use is as follows:

#!/usr/bin/env python
import paratext
print sum(map(lambda x: len(x[1]), paratext.load_raw_csv("/dev/stdin",
    no_header=True, allow_quoted_newlines=True)))
$ python2/csvreader-paratext.py < /tmp/hello.csv
Traceback (most recent call last):
  File "python2/csvreader-paratext.py", line 4, in <module>
    allow_quoted_newlines=True)])
  File "/home/ehiggs/.virtualenvs/paratext/lib/python2.7/site-packages/paratext/core.py", line 271, in load_raw_csv
    loader = internal_create_csv_loader(filename, *args, **kwargs)
  File "/home/ehiggs/.virtualenvs/paratext/lib/python2.7/site-packages/paratext/core.py", line 161, in internal_create_csv_loader
    loader.load(_make_posix_filename(filename), params)
  File "/home/ehiggs/.virtualenvs/paratext/lib/python2.7/site-packages/paratext_internal.py", line 414, in load
    return _paratext_internal.ColBasedLoader_load(self, filename, params)
RuntimeError: The file ends with an open quote (4506147)

This could also be an issue if someone were to read in an actual file by name but pass in /dev/stdin.

ehiggs commented 5 years ago

This is also an issue if you use something like myscript <(grep interesting-line file.csv)