Closed berinhard closed 6 years ago
By now all the functions expect a binary opened file (if you're passing a file-like object) and an encoding
parameter if it's different from UTF-8. We may adapt the functions to work even with text file-objects or (by now) raise an exception if a text-like file object is provided (could you pelase create a PR for the CSV plugin?). Also, the open_compressed
is not meant (by now) to be used as a context manager (sorry, still need to test/implement some things).
Making little changes on your code will do the job:
import rows
from rows.utils import open_compressed
filename = 'data/balneabilidade-bahia/balneabilidade.csv.xz'
encoding = 'utf-8'
fobj = open_compressed(filename, mode='rb', encoding=encoding)
table = rows.import_from_csv(fobj)
for row in table:
print(row)
Note that the code above is greedy, so will load everything in memory to create the table
(I'm still working on lazy evaluation of tables). If your CSV is big, then you can use this approach (will work for any CSV dialect):
import csv
import rows
from rows.utils import open_compressed
filename = 'data/balneabilidade-bahia/balneabilidade.csv.xz'
encoding = 'utf-8'
# First, open the file (in binary mode) to detect its dialect using a 1MiB sample
fobj = open_compressed(filename, mode='rb')
dialect = rows.plugins.csv.discover_dialect(fobj.read(1024 ** 2), encoding=encoding)
# Now open again (in text mode) to read it lazily
fobj = open_compressed(filename, encoding=encoding)
reader = csv.DictReader(fobj, dialect=dialect)
for row in reader:
print(row)
If you'd like to use the same rows interface and have all values converted, you can take a sample, import a table using this sample (so the library can detect the column types) and then import the file lazily (using a simple mokey patch) with the detected types. It's done in csv2sqlite
function (forgot to detect the dialect there - will create an issue).
In the future we're going to create an "universal rows import function" that will automatically handle compressed files (like rows.utils.import_from_uri
does discovering the plugin, but also for compressed files).
I have a
sample.csv.gz
file and I'm getting the following error when trying to open it withrows.utils.open_compressed
: