harelba / q

q - Run SQL directly on delimited files and multi-file sqlite databases
http://harelba.github.io/q/
GNU General Public License v3.0
10.21k stars 423 forks source link

Erroneous output on a TSV file #192

Closed Berkmann18 closed 5 years ago

Berkmann18 commented 5 years ago

I'm trying to get some statistical data out of a TSV file but no matter if I use c1 and the like or the name of the columns it seems that it can't find the column when specifying the names and it doesn't return the correct row.

Example 0 q "SELECT c1 FROM WCA_export_Results.tsv":

Expected

competitionId
LyonOpen2007
LyonOpen2007
...

Actual

competitionId   eventId roundTypeId     pos     best    average personName      personId        personCountryId formatId        value1  value2  value3  value4  value5  regionalSingleRecord
    regionalAverageRecord
LyonOpen2007    333     1       15      1968    2128    Etienne
LyonOpen2007    333     1       16      1731    2140    Thomas
LyonOpen2007    333     1       17      2305    2637    Antoine
...

Example 1 q "SELECT competitionId FROM WCA_export_Results.tsv":

Expected

competitionId
LyonOpen2007
LyonOpen2007
...

Actual

query error: no such column: personId

Example 2 SELECT * FROM WCA_export_Results.tsv WHERE c1 = 'LyonOpen2007':

Expected

competitionId   eventId roundTypeId     pos     best    average personName      personId        personCountryId formatId        value1  value2  value3  value4  value5  regionalSingleRecord
    regionalAverageRecord   
LyonOpen2007    333     1       15      1968    2128    Etienne Amany   2007AMAN01      Cote d_Ivoire   a       1968    2203    2138    2139    2108    AfR     AfR 
LyonOpen2007    333     1       16      1731    2140    Thomas Rouault  2004ROUA01      France  a       2222    2153    1731    2334    2046              
LyonOpen2007    333     1       17      2305    2637    Antoine Simon-Chautemps 2005SIMO01      France  a       3430    2581    2540    2789    2305              
LyonOpen2007    333     1       18      2452    2637    Irène Mallordy  2007MALL01      France  a       2715    2452    2868    2632    2564        
...

Actual

FYI, I'm firstly trying to use a more complex query such as SELECT COUNT(personId) FROM WCA_export_Results.tsv WHERE competitionId = 'Lyon2007'" (which obviously fails). Then I'll see how I could go about getting the average number of competitors per year per country.

Environment

harelba commented 5 years ago

This seems like a delimiter issue. For historical reasons, q's default delimiter is not a tab, but a space.

if your file is tab delimited, then just use -t or -d $'\t' as a parameter, in order to tell q that it's a tab delimited file.

One important way to analyze the column/type inference of q is to run it with -A, which shows the structure that it has detected on the data files.

Berkmann18 commented 5 years ago

Ah awesome, thank you.