Closed bklaas closed 9 years ago
I've gotten the errors on python 2.6.6 and 2.7.6
Can you provide a sample of your data? It's need it to reproduce your problem and maybe give a solution/patch.
I was using the data from the tutorial, ne_1033_data.xlsx.
I can reproduce the problem against these three lines of csv:
[bklaas@bklaas csvkit_testing]$ cat test.csv RecordType,Var,Col,Wid,Frm,Value,VarLabel,ValueLabel,VarLabelOrig,ValueLabelOrig,Freq,Sel,Notes,Svar,ValueSvar,VarLabelSvar,ValueLabelSvar,UnivSvar,NoRec,NonTab,Hide,Decim,String,CommP,CodeTy,DDoc1,DTag1,JDoc1,JTag1,DDoc2,DTag2,JDoc2,JTag2,AnchorForm,AnchorInst C,RT,1,1,,,Record Type,,RECTYPE,1(1),,,,US2009A_0010,,,,All households and group quarters.,1,,,,,,,x,x,x,x,x,x,x,x,, C,SERIALNO,2,7,,,Housing unit/GQ person serial number,,SERIAL,2(7),skip: 1382515,,,US2009A_0011,,,,All households and group quarters.,1,,,,,,,x,x,x,x,x,x,x,x,, [bklaas@bklaas csvkit_testing]$ csvlook test.csv must be str, not bytes [bklaas@bklaas csvkit_testing]$
...and to be clear, I can reproduce the issue with any csv file. [bklaas@bklaas csvkit_testing]$ cat test2.csv Test,Test2,Test3 Foo,Bar,Foobar [bklaas@bklaas csvkit_testing]$ csvlook test2.csv must be str, not bytes [bklaas@bklaas csvkit_testing]$
Same issue here after updating from 0.8.0 to 0.9.0 today. Appears on streaming data from an API and any csv file.
proj/tee git:(feature/social) ➜ cat xcy.csv | csvsort -c 7 | csvlook
must be str, not bytes
proj/tee git:(feature/social) ➜ cat xcy.csv | csvlook
must be str, not bytes
proj/tee git:(feature/social) ➜ cat xcy.csv | csvsort -r -c 7 | head
URL,Pinterest,LinkedIn,Facebook like_count,StumbleUpon,Facebook share_count,Facebook total_count,GooglePlusOne,Delicious,Twitter,Facebook commentsbox_count,Facebook click_count,Diggs,Buzz,Facebook comment_count,Reddit
http://teespring.com/vettechsuperpower,545,2,38156,4,7822,54120,5,0,45,0,0,0,0,8142,0
http://teespring.com/veteranforfreedom2,0,0,12627,0,2784,15882,0,0,13,0,0,0,0,471,0
http://teespring.com/usmc-limitededition,4,0,10112,0,2162,13065,0,0,14,0,0,0,0,791,0
http://teespring.com/vettechmutts,309,0,8331,0,2338,12092,0,0,1,0,0,0,0,1423,0
http://teespring.com/vetsforfreedom,0,0,7045,0,1226,8500,0,0,2,0,0,0,0,229,0
http://teespring.com/valdez,3,0,4936,0,1031,7951,0,0,1,0,0,0,0,1984,0
http://teespring.com/upallnightcolts,1,0,4714,0,1211,6988,0,0,0,0,0,0,0,1063,0
http://teespring.com/veterand2,0,0,5549,0,1104,6930,0,0,1,0,0,0,0,277,0
http://teespring.com/valeriethingmeme,5,0,3581,0,1183,5641,0,0,2,0,0,0,0,877,0
proj/tee git:(feature/social) ➜ file xcy.csv
xcy.csv: ASCII text, with CRLF line terminators
proj/tee git:(feature/social) ➜ stat xcy.csv
File: ‘xcy.csv’
Size: 133892 Blocks: 264 IO Block: 4096 regular file
Device: fc09h/74513d Inode: 279301 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ duck) Gid: ( 1000/ duck)
Access: 2014-10-19 07:13:11.437361226 +0700
Modify: 2014-10-19 06:48:43.829318810 +0700
Change: 2014-10-19 06:48:43.829318810 +0700
Birth: -
Rolling back to 0.8.0 for now.
I confirm withy any csv data (Arch Linux Python 3.4.2 and csvkit 0.9
confirm too with 3.4 on ubuntu.
The same error on windows 8 and python 2.7.8
I'm seeing the same thing (Python 3.4.2 / csvkit 0.9.0) % csvlook thing.csv -v
Traceback (most recent call last): File "/home/bhoule/python/bin/csvlook", line 9, inload_entry_point('csvkit==0.9.0', 'console_scripts', 'csvlook')() File "/u/bhoule/python/lib/python3.4/site-packages/csvkit/utilities/csvlook.py", line 78, in launch_new_instance utility.main() File "/u/bhoule/python/lib/python3.4/site-packages/csvkit/utilities/csvlook.py", line 61, in main write('%s\n' % divider) File "/u/bhoule/python/lib/python3.4/site-packages/csvkit/utilities/csvlook.py", line 59, in write = lambda t: self.output_file.write(t.encode('utf-8')) TypeError: must be str, not bytes
Unfortunately I have tested this and it's not a complete fix.
With released csvkit: [bklaas@bklaas csvkit]$ csvcut -c variable,label,rec usa_variables.csv | csvstat must be str, not bytes
With git clone checkout that includes the fix: (csvkit)[bklaas@bklaas csvkit]$ csvcut -c variable,label,rec usa_variables.csv | csvstat 'str' does not support the buffer interface
I can run csvcut in the released csvkit without the pipe to csvstat. In the github checkout, I can't run any csvkit commands at all without the "does not support the buffer interface" error.
I am on python 3.4.2.
I setup a virtualenv for python 2.7 and I don't see the issue using it, so the remaining problem appears to be python 3.x-specific.
same command as last comment-- [bklaas@bklaas csvkit]$ csvcut -c variable,label,rec usa_variables.csv | csvstat
Row count: 1310
Annnnnnd I broke the tests.
I'm having this problem on Windows 8. On any file I can't use csvlook or csvstat without getting the "must be str, not bytes". I'm using Python 3.4.3
I was able to use csvsql command which surprised me, but pleasantly so.
Possible pointers for a fix: http://stackoverflow.com/questions/5512811/builtins-typeerror-must-be-str-not-bytes http://stackoverflow.com/questions/4980292/programming-python-for-absolute-beginners-chapter-7-storing-complex-data