queryverse / TextParse.jl

A bunch of fast text parsing tools
Other
57 stars 20 forks source link

4GB CSV file turns into 8GB JuliaDB file #137

Closed MaximilianJHuber closed 5 years ago

MaximilianJHuber commented 5 years ago

This post documents a problem with a very basic use case of JuliaDB.

using JuliaDB
acs = loadtable("psam_pusa.csv", type_detect_rows=200)
save(acs, "test")

yields an 8GB file, although psam_pusa.csv is only 4GB. The infered types are two Strings, many Int64s and many Union{Missing,Int64}s.

acs = loadtable("C:\\Users\\Max\\Desktop\\psam_pusa.csv", 
    colparsers = vcat(String, repeat([Union{Missing,Int64}], 95), String, repeat([Union{Missing,Int64}], 30), String, repeat([Union{Missing,Int64}], 158)))

yields a 2.5GB file.

Does the column type inference work properly? Or is it a storage problem of JuliaDB. I am on Julia 1.1.0, TextParse 0.9.1+, and JuliaDB 0.12

davidanthoff commented 5 years ago

Could you try to look at the actual types used to store each column? So not just their element type, but what array type you end up with for each column for these two cases, and whether they are different?

MaximilianJHuber commented 5 years ago

@davidanthoff they are. And I would have expected the more concretely typed table to need less storage, but that is not true.

using JuliaDB
acs = loadtable("psam_pusa.csv", type_detect_rows=200)
save(acs, "test")

or

using JuliaDB
acs_better = loadtable("psam_pusa.csv", 
    colparsers = vcat(String, repeat([Int64], 95), String, repeat([Int64], 30), String, repeat([Int64], 158)))
save(acs_better, "test_better")

both yield the following and a 8GB file:

Table with 4691835 rows, 286 columns:
Columns:
#    colname    type
─────────────────────────────────────
1    RT         String
2    SERIALNO   Int64
3    DIVISION   Int64
4    SPORDER    Int64
5    PUMA       Int64
6    REGION     Int64
7    ST         Int64
8    ADJINC     Int64
9    PWGTP      Int64
10   AGEP       Int64
11   CIT        Int64
12   CITWP      Union{Missing, Int64}
13   COW        Union{Missing, Int64}
14   DDRS       Union{Missing, Int64}
15   DEAR       Int64
16   DEYE       Int64
17   DOUT       Union{Missing, Int64}
18   DPHY       Union{Missing, Int64}
19   DRAT       Union{Missing, Int64}
20   DRATX      Union{Missing, Int64}
21   DREM       Union{Missing, Int64}
22   ENG        Union{Missing, Int64}
23   FER        Union{Missing, Int64}
24   GCL        Union{Missing, Int64}
25   GCM        Union{Missing, Int64}
26   GCR        Union{Missing, Int64}
27   HINS1      Int64
28   HINS2      Int64
29   HINS3      Int64
30   HINS4      Int64
31   HINS5      Int64
32   HINS6      Int64
33   HINS7      Int64
34   INTP       Union{Missing, Int64}
35   JWMNP      Union{Missing, Int64}
36   JWRIP      Union{Missing, Int64}
37   JWTR       Union{Missing, Int64}
38   LANX       Union{Missing, Int64}
39   MAR        Int64
40   MARHD      Union{Missing, Int64}
41   MARHM      Union{Missing, Int64}
42   MARHT      Union{Missing, Int64}
43   MARHW      Union{Missing, Int64}
44   MARHYP     Union{Missing, Int64}
45   MIG        Union{Missing, Int64}
46   MIL        Union{Missing, Int64}
47   MLPA       Union{Missing, Int64}
48   MLPB       Union{Missing, Int64}
49   MLPCD      Union{Missing, Int64}
50   MLPE       Union{Missing, Int64}
51   MLPFG      Union{Missing, Int64}
52   MLPH       Union{Missing, Int64}
53   MLPI       Union{Missing, Int64}
54   MLPJ       Union{Missing, Int64}
55   MLPK       Union{Missing, Int64}
56   NWAB       Union{Missing, Int64}
57   NWAV       Union{Missing, Int64}
58   NWLA       Union{Missing, Int64}
59   NWLK       Union{Missing, Int64}
60   NWRE       Union{Missing, Int64}
61   OIP        Union{Missing, Int64}
62   PAP        Union{Missing, Int64}
63   RELP       Int64
64   RETP       Union{Missing, Int64}
65   SCH        Union{Missing, Int64}
66   SCHG       Union{Missing, Int64}
67   SCHL       Union{Missing, Int64}
68   SEMP       Union{Missing, Int64}
69   SEX        Int64
70   SSIP       Union{Missing, Int64}
71   SSP        Union{Missing, Int64}
72   WAGP       Union{Missing, Int64}
73   WKHP       Union{Missing, Int64}
74   WKL        Union{Missing, Int64}
75   WKW        Union{Missing, Int64}
76   WRK        Union{Missing, Int64}
77   YOEP       Union{Missing, Int64}
78   ANC        Int64
79   ANC1P      Int64
80   ANC2P      Int64
81   DECADE     Union{Missing, Int64}
82   DIS        Int64
83   DRIVESP    Union{Missing, Int64}
84   ESP        Union{Missing, Int64}
85   ESR        Union{Missing, Int64}
86   FOD1P      Union{Missing, Int64}
87   FOD2P      Union{Missing, Int64}
88   HICOV      Int64
89   HISP       Int64
90   INDP       Union{Missing, Int64}
91   JWAP       Union{Missing, Int64}
92   JWDP       Union{Missing, Int64}
93   LANP       Union{Missing, Int64}
94   MIGPUMA    Union{Missing, Int64}
95   MIGSP      Union{Missing, Int64}
96   MSP        Union{Missing, Int64}
97   NAICSP     String
98   NATIVITY   Int64
99   NOP        Union{Missing, Int64}
100  OC         Union{Missing, Int64}
101  OCCP       Union{Missing, Int64}
102  PAOC       Union{Missing, Int64}
103  PERNP      Union{Missing, Int64}
104  PINCP      Union{Missing, Int64}
105  POBP       Int64
106  POVPIP     Union{Missing, Int64}
107  POWPUMA    Union{Missing, Int64}
108  POWSP      Union{Missing, Int64}
109  PRIVCOV    Int64
110  PUBCOV     Int64
111  QTRBIR     Int64
112  RAC1P      Int64
113  RAC2P      Int64
114  RAC3P      Int64
115  RACAIAN    Int64
116  RACASN     Int64
117  RACBLK     Int64
118  RACNH      Int64
119  RACNUM     Int64
120  RACPI      Int64
121  RACSOR     Int64
122  RACWHT     Int64
123  RC         Union{Missing, Int64}
124  SCIENGP    Union{Missing, Int64}
125  SCIENGRLP  Union{Missing, Int64}
126  SFN        Union{Missing, Int64}
127  SFR        Union{Missing, Int64}
128  SOCP       String
129  VPS        Union{Missing, Int64}
130  WAOB       Int64
131  FAGEP      Int64
132  FANCP      Int64
133  FCITP      Int64
134  FCITWP     Int64
135  FCOWP      Int64
136  FDDRSP     Int64
137  FDEARP     Int64
138  FDEYEP     Int64
139  FDISP      Int64
140  FDOUTP     Int64
141  FDPHYP     Int64
142  FDRATP     Int64
143  FDRATXP    Int64
144  FDREMP     Int64
145  FENGP      Int64
146  FESRP      Int64
147  FFERP      Int64
148  FFODP      Int64
149  FGCLP      Int64
150  FGCMP      Int64
151  FGCRP      Int64
152  FHICOVP    Int64
153  FHINS1P    Int64
154  FHINS2P    Int64
155  FHINS3C    Union{Missing, Int64}
156  FHINS3P    Int64
157  FHINS4C    Union{Missing, Int64}
158  FHINS4P    Int64
159  FHINS5C    Union{Missing, Int64}
160  FHINS5P    Int64
161  FHINS6P    Int64
162  FHINS7P    Int64
163  FHISP      Int64
164  FINDP      Int64
165  FINTP      Int64
166  FJWDP      Int64
167  FJWMNP     Int64
168  FJWRIP     Int64
169  FJWTRP     Int64
170  FLANP      Int64
171  FLANXP     Int64
172  FMARP      Int64
173  FMARHDP    Int64
174  FMARHMP    Int64
175  FMARHTP    Int64
176  FMARHWP    Int64
177  FMARHYP    Int64
178  FMIGP      Int64
179  FMIGSP     Int64
180  FMILPP     Int64
181  FMILSP     Int64
182  FOCCP      Int64
183  FOIP       Int64
184  FPAP       Int64
185  FPERNP     Int64
186  FPINCP     Int64
187  FPOBP      Int64
188  FPOWSP     Int64
189  FPRIVCOVP  Int64
190  FPUBCOVP   Int64
191  FRACP      Int64
192  FRELP      Int64
193  FRETP      Int64
194  FSCHGP     Int64
195  FSCHLP     Int64
196  FSCHP      Int64
197  FSEMP      Int64
198  FSEXP      Int64
199  FSSIP      Int64
200  FSSP       Int64
201  FWAGP      Int64
202  FWKHP      Int64
203  FWKLP      Int64
204  FWKWP      Int64
205  FWRKP      Int64
206  FYOEP      Int64
207  PWGTP1     Int64
208  PWGTP2     Int64
209  PWGTP3     Int64
210  PWGTP4     Int64
211  PWGTP5     Int64
212  PWGTP6     Int64
213  PWGTP7     Int64
214  PWGTP8     Int64
215  PWGTP9     Int64
216  PWGTP10    Int64
217  PWGTP11    Int64
218  PWGTP12    Int64
219  PWGTP13    Int64
220  PWGTP14    Int64
221  PWGTP15    Int64
222  PWGTP16    Int64
223  PWGTP17    Int64
224  PWGTP18    Int64
225  PWGTP19    Int64
226  PWGTP20    Int64
227  PWGTP21    Int64
228  PWGTP22    Int64
229  PWGTP23    Int64
230  PWGTP24    Int64
231  PWGTP25    Int64
232  PWGTP26    Int64
233  PWGTP27    Int64
234  PWGTP28    Int64
235  PWGTP29    Int64
236  PWGTP30    Int64
237  PWGTP31    Int64
238  PWGTP32    Int64
239  PWGTP33    Int64
240  PWGTP34    Int64
241  PWGTP35    Int64
242  PWGTP36    Int64
243  PWGTP37    Int64
244  PWGTP38    Int64
245  PWGTP39    Int64
246  PWGTP40    Int64
247  PWGTP41    Int64
248  PWGTP42    Int64
249  PWGTP43    Int64
250  PWGTP44    Int64
251  PWGTP45    Int64
252  PWGTP46    Int64
253  PWGTP47    Int64
254  PWGTP48    Int64
255  PWGTP49    Int64
256  PWGTP50    Int64
257  PWGTP51    Int64
258  PWGTP52    Int64
259  PWGTP53    Int64
260  PWGTP54    Int64
261  PWGTP55    Int64
262  PWGTP56    Int64
263  PWGTP57    Int64
264  PWGTP58    Int64
265  PWGTP59    Int64
266  PWGTP60    Int64
267  PWGTP61    Int64
268  PWGTP62    Int64
269  PWGTP63    Int64
270  PWGTP64    Int64
271  PWGTP65    Int64
272  PWGTP66    Int64
273  PWGTP67    Int64
274  PWGTP68    Int64
275  PWGTP69    Int64
276  PWGTP70    Int64
277  PWGTP71    Int64
278  PWGTP72    Int64
279  PWGTP73    Int64
280  PWGTP74    Int64
281  PWGTP75    Int64
282  PWGTP76    Int64
283  PWGTP77    Int64
284  PWGTP78    Int64
285  PWGTP79    Int64
286  PWGTP80    Int64

And

using JuliaDB
acs_better = loadtable("psam_pusa.csv", colparsers = vcat(String, repeat([Union{Missing,Int64}], 95), String, 
        repeat([Union{Missing,Int64}], 30), String, repeat([Union{Missing,Int64}], 158)))
save(acs_better, "test_better")

yields the following and a 2.5GB file:

Table with 4691835 rows, 286 columns:
Columns:
#    colname    type
─────────────────────────────────────
1    RT         String
2    SERIALNO   Union{Missing, Int64}
3    DIVISION   Union{Missing, Int64}
4    SPORDER    Union{Missing, Int64}
5    PUMA       Union{Missing, Int64}
6    REGION     Union{Missing, Int64}
7    ST         Union{Missing, Int64}
8    ADJINC     Union{Missing, Int64}
9    PWGTP      Union{Missing, Int64}
10   AGEP       Union{Missing, Int64}
11   CIT        Union{Missing, Int64}
12   CITWP      Union{Missing, Int64}
13   COW        Union{Missing, Int64}
14   DDRS       Union{Missing, Int64}
15   DEAR       Union{Missing, Int64}
16   DEYE       Union{Missing, Int64}
17   DOUT       Union{Missing, Int64}
18   DPHY       Union{Missing, Int64}
19   DRAT       Union{Missing, Int64}
20   DRATX      Union{Missing, Int64}
21   DREM       Union{Missing, Int64}
22   ENG        Union{Missing, Int64}
23   FER        Union{Missing, Int64}
24   GCL        Union{Missing, Int64}
25   GCM        Union{Missing, Int64}
26   GCR        Union{Missing, Int64}
27   HINS1      Union{Missing, Int64}
28   HINS2      Union{Missing, Int64}
29   HINS3      Union{Missing, Int64}
30   HINS4      Union{Missing, Int64}
31   HINS5      Union{Missing, Int64}
32   HINS6      Union{Missing, Int64}
33   HINS7      Union{Missing, Int64}
34   INTP       Union{Missing, Int64}
35   JWMNP      Union{Missing, Int64}
36   JWRIP      Union{Missing, Int64}
37   JWTR       Union{Missing, Int64}
38   LANX       Union{Missing, Int64}
39   MAR        Union{Missing, Int64}
40   MARHD      Union{Missing, Int64}
41   MARHM      Union{Missing, Int64}
42   MARHT      Union{Missing, Int64}
43   MARHW      Union{Missing, Int64}
44   MARHYP     Union{Missing, Int64}
45   MIG        Union{Missing, Int64}
46   MIL        Union{Missing, Int64}
47   MLPA       Union{Missing, Int64}
48   MLPB       Union{Missing, Int64}
49   MLPCD      Union{Missing, Int64}
50   MLPE       Union{Missing, Int64}
51   MLPFG      Union{Missing, Int64}
52   MLPH       Union{Missing, Int64}
53   MLPI       Union{Missing, Int64}
54   MLPJ       Union{Missing, Int64}
55   MLPK       Union{Missing, Int64}
56   NWAB       Union{Missing, Int64}
57   NWAV       Union{Missing, Int64}
58   NWLA       Union{Missing, Int64}
59   NWLK       Union{Missing, Int64}
60   NWRE       Union{Missing, Int64}
61   OIP        Union{Missing, Int64}
62   PAP        Union{Missing, Int64}
63   RELP       Union{Missing, Int64}
64   RETP       Union{Missing, Int64}
65   SCH        Union{Missing, Int64}
66   SCHG       Union{Missing, Int64}
67   SCHL       Union{Missing, Int64}
68   SEMP       Union{Missing, Int64}
69   SEX        Union{Missing, Int64}
70   SSIP       Union{Missing, Int64}
71   SSP        Union{Missing, Int64}
72   WAGP       Union{Missing, Int64}
73   WKHP       Union{Missing, Int64}
74   WKL        Union{Missing, Int64}
75   WKW        Union{Missing, Int64}
76   WRK        Union{Missing, Int64}
77   YOEP       Union{Missing, Int64}
78   ANC        Union{Missing, Int64}
79   ANC1P      Union{Missing, Int64}
80   ANC2P      Union{Missing, Int64}
81   DECADE     Union{Missing, Int64}
82   DIS        Union{Missing, Int64}
83   DRIVESP    Union{Missing, Int64}
84   ESP        Union{Missing, Int64}
85   ESR        Union{Missing, Int64}
86   FOD1P      Union{Missing, Int64}
87   FOD2P      Union{Missing, Int64}
88   HICOV      Union{Missing, Int64}
89   HISP       Union{Missing, Int64}
90   INDP       Union{Missing, Int64}
91   JWAP       Union{Missing, Int64}
92   JWDP       Union{Missing, Int64}
93   LANP       Union{Missing, Int64}
94   MIGPUMA    Union{Missing, Int64}
95   MIGSP      Union{Missing, Int64}
96   MSP        Union{Missing, Int64}
97   NAICSP     String
98   NATIVITY   Union{Missing, Int64}
99   NOP        Union{Missing, Int64}
100  OC         Union{Missing, Int64}
101  OCCP       Union{Missing, Int64}
102  PAOC       Union{Missing, Int64}
103  PERNP      Union{Missing, Int64}
104  PINCP      Union{Missing, Int64}
105  POBP       Union{Missing, Int64}
106  POVPIP     Union{Missing, Int64}
107  POWPUMA    Union{Missing, Int64}
108  POWSP      Union{Missing, Int64}
109  PRIVCOV    Union{Missing, Int64}
110  PUBCOV     Union{Missing, Int64}
111  QTRBIR     Union{Missing, Int64}
112  RAC1P      Union{Missing, Int64}
113  RAC2P      Union{Missing, Int64}
114  RAC3P      Union{Missing, Int64}
115  RACAIAN    Union{Missing, Int64}
116  RACASN     Union{Missing, Int64}
117  RACBLK     Union{Missing, Int64}
118  RACNH      Union{Missing, Int64}
119  RACNUM     Union{Missing, Int64}
120  RACPI      Union{Missing, Int64}
121  RACSOR     Union{Missing, Int64}
122  RACWHT     Union{Missing, Int64}
123  RC         Union{Missing, Int64}
124  SCIENGP    Union{Missing, Int64}
125  SCIENGRLP  Union{Missing, Int64}
126  SFN        Union{Missing, Int64}
127  SFR        Union{Missing, Int64}
128  SOCP       String
129  VPS        Union{Missing, Int64}
130  WAOB       Union{Missing, Int64}
131  FAGEP      Union{Missing, Int64}
132  FANCP      Union{Missing, Int64}
133  FCITP      Union{Missing, Int64}
134  FCITWP     Union{Missing, Int64}
135  FCOWP      Union{Missing, Int64}
136  FDDRSP     Union{Missing, Int64}
137  FDEARP     Union{Missing, Int64}
138  FDEYEP     Union{Missing, Int64}
139  FDISP      Union{Missing, Int64}
140  FDOUTP     Union{Missing, Int64}
141  FDPHYP     Union{Missing, Int64}
142  FDRATP     Union{Missing, Int64}
143  FDRATXP    Union{Missing, Int64}
144  FDREMP     Union{Missing, Int64}
145  FENGP      Union{Missing, Int64}
146  FESRP      Union{Missing, Int64}
147  FFERP      Union{Missing, Int64}
148  FFODP      Union{Missing, Int64}
149  FGCLP      Union{Missing, Int64}
150  FGCMP      Union{Missing, Int64}
151  FGCRP      Union{Missing, Int64}
152  FHICOVP    Union{Missing, Int64}
153  FHINS1P    Union{Missing, Int64}
154  FHINS2P    Union{Missing, Int64}
155  FHINS3C    Union{Missing, Int64}
156  FHINS3P    Union{Missing, Int64}
157  FHINS4C    Union{Missing, Int64}
158  FHINS4P    Union{Missing, Int64}
159  FHINS5C    Union{Missing, Int64}
160  FHINS5P    Union{Missing, Int64}
161  FHINS6P    Union{Missing, Int64}
162  FHINS7P    Union{Missing, Int64}
163  FHISP      Union{Missing, Int64}
164  FINDP      Union{Missing, Int64}
165  FINTP      Union{Missing, Int64}
166  FJWDP      Union{Missing, Int64}
167  FJWMNP     Union{Missing, Int64}
168  FJWRIP     Union{Missing, Int64}
169  FJWTRP     Union{Missing, Int64}
170  FLANP      Union{Missing, Int64}
171  FLANXP     Union{Missing, Int64}
172  FMARP      Union{Missing, Int64}
173  FMARHDP    Union{Missing, Int64}
174  FMARHMP    Union{Missing, Int64}
175  FMARHTP    Union{Missing, Int64}
176  FMARHWP    Union{Missing, Int64}
177  FMARHYP    Union{Missing, Int64}
178  FMIGP      Union{Missing, Int64}
179  FMIGSP     Union{Missing, Int64}
180  FMILPP     Union{Missing, Int64}
181  FMILSP     Union{Missing, Int64}
182  FOCCP      Union{Missing, Int64}
183  FOIP       Union{Missing, Int64}
184  FPAP       Union{Missing, Int64}
185  FPERNP     Union{Missing, Int64}
186  FPINCP     Union{Missing, Int64}
187  FPOBP      Union{Missing, Int64}
188  FPOWSP     Union{Missing, Int64}
189  FPRIVCOVP  Union{Missing, Int64}
190  FPUBCOVP   Union{Missing, Int64}
191  FRACP      Union{Missing, Int64}
192  FRELP      Union{Missing, Int64}
193  FRETP      Union{Missing, Int64}
194  FSCHGP     Union{Missing, Int64}
195  FSCHLP     Union{Missing, Int64}
196  FSCHP      Union{Missing, Int64}
197  FSEMP      Union{Missing, Int64}
198  FSEXP      Union{Missing, Int64}
199  FSSIP      Union{Missing, Int64}
200  FSSP       Union{Missing, Int64}
201  FWAGP      Union{Missing, Int64}
202  FWKHP      Union{Missing, Int64}
203  FWKLP      Union{Missing, Int64}
204  FWKWP      Union{Missing, Int64}
205  FWRKP      Union{Missing, Int64}
206  FYOEP      Union{Missing, Int64}
207  PWGTP1     Union{Missing, Int64}
208  PWGTP2     Union{Missing, Int64}
209  PWGTP3     Union{Missing, Int64}
210  PWGTP4     Union{Missing, Int64}
211  PWGTP5     Union{Missing, Int64}
212  PWGTP6     Union{Missing, Int64}
213  PWGTP7     Union{Missing, Int64}
214  PWGTP8     Union{Missing, Int64}
215  PWGTP9     Union{Missing, Int64}
216  PWGTP10    Union{Missing, Int64}
217  PWGTP11    Union{Missing, Int64}
218  PWGTP12    Union{Missing, Int64}
219  PWGTP13    Union{Missing, Int64}
220  PWGTP14    Union{Missing, Int64}
221  PWGTP15    Union{Missing, Int64}
222  PWGTP16    Union{Missing, Int64}
223  PWGTP17    Union{Missing, Int64}
224  PWGTP18    Union{Missing, Int64}
225  PWGTP19    Union{Missing, Int64}
226  PWGTP20    Union{Missing, Int64}
227  PWGTP21    Union{Missing, Int64}
228  PWGTP22    Union{Missing, Int64}
229  PWGTP23    Union{Missing, Int64}
230  PWGTP24    Union{Missing, Int64}
231  PWGTP25    Union{Missing, Int64}
232  PWGTP26    Union{Missing, Int64}
233  PWGTP27    Union{Missing, Int64}
234  PWGTP28    Union{Missing, Int64}
235  PWGTP29    Union{Missing, Int64}
236  PWGTP30    Union{Missing, Int64}
237  PWGTP31    Union{Missing, Int64}
238  PWGTP32    Union{Missing, Int64}
239  PWGTP33    Union{Missing, Int64}
240  PWGTP34    Union{Missing, Int64}
241  PWGTP35    Union{Missing, Int64}
242  PWGTP36    Union{Missing, Int64}
243  PWGTP37    Union{Missing, Int64}
244  PWGTP38    Union{Missing, Int64}
245  PWGTP39    Union{Missing, Int64}
246  PWGTP40    Union{Missing, Int64}
247  PWGTP41    Union{Missing, Int64}
248  PWGTP42    Union{Missing, Int64}
249  PWGTP43    Union{Missing, Int64}
250  PWGTP44    Union{Missing, Int64}
251  PWGTP45    Union{Missing, Int64}
252  PWGTP46    Union{Missing, Int64}
253  PWGTP47    Union{Missing, Int64}
254  PWGTP48    Union{Missing, Int64}
255  PWGTP49    Union{Missing, Int64}
256  PWGTP50    Union{Missing, Int64}
257  PWGTP51    Union{Missing, Int64}
258  PWGTP52    Union{Missing, Int64}
259  PWGTP53    Union{Missing, Int64}
260  PWGTP54    Union{Missing, Int64}
261  PWGTP55    Union{Missing, Int64}
262  PWGTP56    Union{Missing, Int64}
263  PWGTP57    Union{Missing, Int64}
264  PWGTP58    Union{Missing, Int64}
265  PWGTP59    Union{Missing, Int64}
266  PWGTP60    Union{Missing, Int64}
267  PWGTP61    Union{Missing, Int64}
268  PWGTP62    Union{Missing, Int64}
269  PWGTP63    Union{Missing, Int64}
270  PWGTP64    Union{Missing, Int64}
271  PWGTP65    Union{Missing, Int64}
272  PWGTP66    Union{Missing, Int64}
273  PWGTP67    Union{Missing, Int64}
274  PWGTP68    Union{Missing, Int64}
275  PWGTP69    Union{Missing, Int64}
276  PWGTP70    Union{Missing, Int64}
277  PWGTP71    Union{Missing, Int64}
278  PWGTP72    Union{Missing, Int64}
279  PWGTP73    Union{Missing, Int64}
280  PWGTP74    Union{Missing, Int64}
281  PWGTP75    Union{Missing, Int64}
282  PWGTP76    Union{Missing, Int64}
283  PWGTP77    Union{Missing, Int64}
284  PWGTP78    Union{Missing, Int64}
285  PWGTP79    Union{Missing, Int64}
286  PWGTP80    Union{Missing, Int64}
davidanthoff commented 5 years ago

Could you run this on both, where you replace t with acs and acsbetter respectively?

map(i->typeof(i), values(getfield(t.columns, :fieldarrays)))

I'm interested in the array types of each column.

MaximilianJHuber commented 5 years ago

With acs:

(WeakRefStrings.StringArray{String,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1})

And with acs_better:

(WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1})
davidanthoff commented 5 years ago

So, I'm out of ideas what is happening here :) The auto-detected column type cases has fewer Union{Int,Missing} columns, but I assume that those columns actually don't have missing data, so I don't understand how the case with more Union{Int,Missing} columns could end up with a smaller disc footprint. I think @shashi needs to look into this :)

davidanthoff commented 5 years ago

One more thing: I think most likely the problem here has nothing to do with TextParse, at least right now I don’t see how it could. This to me right now looks more like an issue with the JuliaDB save functionality.

MaximilianJHuber commented 5 years ago

Thanks. I opened up an issue with JuliaDB.