openeventdata / phoenix_pipeline

Turning news into events since 2014.
MIT License
50 stars 33 forks source link

unknown bug causing long lines in CSV output #84

Open rigid opened 9 years ago

rigid commented 9 years ago

it seems as there is a bug which manifests in two recent CSV files, causing ultra long lines by repeating fields over and over. It seems, all affected lines have the field "wn_africa" repeated multiple times. The two affected files I found are:

events.full.20150630.txt:

length line number
740892 284
740891 170

events.full.20150701.txt:

length line number
740892 258
740891 1189
740886 1108
704287 2121
704287 1400
740886 1143
704287 2181
704287 1450

downloaded from: https://s3.amazonaws.com/openeventdata/current/events.full.20150630.txt.zip and https://s3.amazonaws.com/openeventdata/current/events.full.20150701.txt.zip