Closed MaximilianJHuber closed 5 years ago
Could you try to look at the actual types used to store each column? So not just their element type, but what array type you end up with for each column for these two cases, and whether they are different?
@davidanthoff they are. And I would have expected the more concretely typed table to need less storage, but that is not true.
using JuliaDB
acs = loadtable("psam_pusa.csv", type_detect_rows=200)
save(acs, "test")
or
using JuliaDB
acs_better = loadtable("psam_pusa.csv",
colparsers = vcat(String, repeat([Int64], 95), String, repeat([Int64], 30), String, repeat([Int64], 158)))
save(acs_better, "test_better")
both yield the following and a 8GB file:
Table with 4691835 rows, 286 columns:
Columns:
# colname type
─────────────────────────────────────
1 RT String
2 SERIALNO Int64
3 DIVISION Int64
4 SPORDER Int64
5 PUMA Int64
6 REGION Int64
7 ST Int64
8 ADJINC Int64
9 PWGTP Int64
10 AGEP Int64
11 CIT Int64
12 CITWP Union{Missing, Int64}
13 COW Union{Missing, Int64}
14 DDRS Union{Missing, Int64}
15 DEAR Int64
16 DEYE Int64
17 DOUT Union{Missing, Int64}
18 DPHY Union{Missing, Int64}
19 DRAT Union{Missing, Int64}
20 DRATX Union{Missing, Int64}
21 DREM Union{Missing, Int64}
22 ENG Union{Missing, Int64}
23 FER Union{Missing, Int64}
24 GCL Union{Missing, Int64}
25 GCM Union{Missing, Int64}
26 GCR Union{Missing, Int64}
27 HINS1 Int64
28 HINS2 Int64
29 HINS3 Int64
30 HINS4 Int64
31 HINS5 Int64
32 HINS6 Int64
33 HINS7 Int64
34 INTP Union{Missing, Int64}
35 JWMNP Union{Missing, Int64}
36 JWRIP Union{Missing, Int64}
37 JWTR Union{Missing, Int64}
38 LANX Union{Missing, Int64}
39 MAR Int64
40 MARHD Union{Missing, Int64}
41 MARHM Union{Missing, Int64}
42 MARHT Union{Missing, Int64}
43 MARHW Union{Missing, Int64}
44 MARHYP Union{Missing, Int64}
45 MIG Union{Missing, Int64}
46 MIL Union{Missing, Int64}
47 MLPA Union{Missing, Int64}
48 MLPB Union{Missing, Int64}
49 MLPCD Union{Missing, Int64}
50 MLPE Union{Missing, Int64}
51 MLPFG Union{Missing, Int64}
52 MLPH Union{Missing, Int64}
53 MLPI Union{Missing, Int64}
54 MLPJ Union{Missing, Int64}
55 MLPK Union{Missing, Int64}
56 NWAB Union{Missing, Int64}
57 NWAV Union{Missing, Int64}
58 NWLA Union{Missing, Int64}
59 NWLK Union{Missing, Int64}
60 NWRE Union{Missing, Int64}
61 OIP Union{Missing, Int64}
62 PAP Union{Missing, Int64}
63 RELP Int64
64 RETP Union{Missing, Int64}
65 SCH Union{Missing, Int64}
66 SCHG Union{Missing, Int64}
67 SCHL Union{Missing, Int64}
68 SEMP Union{Missing, Int64}
69 SEX Int64
70 SSIP Union{Missing, Int64}
71 SSP Union{Missing, Int64}
72 WAGP Union{Missing, Int64}
73 WKHP Union{Missing, Int64}
74 WKL Union{Missing, Int64}
75 WKW Union{Missing, Int64}
76 WRK Union{Missing, Int64}
77 YOEP Union{Missing, Int64}
78 ANC Int64
79 ANC1P Int64
80 ANC2P Int64
81 DECADE Union{Missing, Int64}
82 DIS Int64
83 DRIVESP Union{Missing, Int64}
84 ESP Union{Missing, Int64}
85 ESR Union{Missing, Int64}
86 FOD1P Union{Missing, Int64}
87 FOD2P Union{Missing, Int64}
88 HICOV Int64
89 HISP Int64
90 INDP Union{Missing, Int64}
91 JWAP Union{Missing, Int64}
92 JWDP Union{Missing, Int64}
93 LANP Union{Missing, Int64}
94 MIGPUMA Union{Missing, Int64}
95 MIGSP Union{Missing, Int64}
96 MSP Union{Missing, Int64}
97 NAICSP String
98 NATIVITY Int64
99 NOP Union{Missing, Int64}
100 OC Union{Missing, Int64}
101 OCCP Union{Missing, Int64}
102 PAOC Union{Missing, Int64}
103 PERNP Union{Missing, Int64}
104 PINCP Union{Missing, Int64}
105 POBP Int64
106 POVPIP Union{Missing, Int64}
107 POWPUMA Union{Missing, Int64}
108 POWSP Union{Missing, Int64}
109 PRIVCOV Int64
110 PUBCOV Int64
111 QTRBIR Int64
112 RAC1P Int64
113 RAC2P Int64
114 RAC3P Int64
115 RACAIAN Int64
116 RACASN Int64
117 RACBLK Int64
118 RACNH Int64
119 RACNUM Int64
120 RACPI Int64
121 RACSOR Int64
122 RACWHT Int64
123 RC Union{Missing, Int64}
124 SCIENGP Union{Missing, Int64}
125 SCIENGRLP Union{Missing, Int64}
126 SFN Union{Missing, Int64}
127 SFR Union{Missing, Int64}
128 SOCP String
129 VPS Union{Missing, Int64}
130 WAOB Int64
131 FAGEP Int64
132 FANCP Int64
133 FCITP Int64
134 FCITWP Int64
135 FCOWP Int64
136 FDDRSP Int64
137 FDEARP Int64
138 FDEYEP Int64
139 FDISP Int64
140 FDOUTP Int64
141 FDPHYP Int64
142 FDRATP Int64
143 FDRATXP Int64
144 FDREMP Int64
145 FENGP Int64
146 FESRP Int64
147 FFERP Int64
148 FFODP Int64
149 FGCLP Int64
150 FGCMP Int64
151 FGCRP Int64
152 FHICOVP Int64
153 FHINS1P Int64
154 FHINS2P Int64
155 FHINS3C Union{Missing, Int64}
156 FHINS3P Int64
157 FHINS4C Union{Missing, Int64}
158 FHINS4P Int64
159 FHINS5C Union{Missing, Int64}
160 FHINS5P Int64
161 FHINS6P Int64
162 FHINS7P Int64
163 FHISP Int64
164 FINDP Int64
165 FINTP Int64
166 FJWDP Int64
167 FJWMNP Int64
168 FJWRIP Int64
169 FJWTRP Int64
170 FLANP Int64
171 FLANXP Int64
172 FMARP Int64
173 FMARHDP Int64
174 FMARHMP Int64
175 FMARHTP Int64
176 FMARHWP Int64
177 FMARHYP Int64
178 FMIGP Int64
179 FMIGSP Int64
180 FMILPP Int64
181 FMILSP Int64
182 FOCCP Int64
183 FOIP Int64
184 FPAP Int64
185 FPERNP Int64
186 FPINCP Int64
187 FPOBP Int64
188 FPOWSP Int64
189 FPRIVCOVP Int64
190 FPUBCOVP Int64
191 FRACP Int64
192 FRELP Int64
193 FRETP Int64
194 FSCHGP Int64
195 FSCHLP Int64
196 FSCHP Int64
197 FSEMP Int64
198 FSEXP Int64
199 FSSIP Int64
200 FSSP Int64
201 FWAGP Int64
202 FWKHP Int64
203 FWKLP Int64
204 FWKWP Int64
205 FWRKP Int64
206 FYOEP Int64
207 PWGTP1 Int64
208 PWGTP2 Int64
209 PWGTP3 Int64
210 PWGTP4 Int64
211 PWGTP5 Int64
212 PWGTP6 Int64
213 PWGTP7 Int64
214 PWGTP8 Int64
215 PWGTP9 Int64
216 PWGTP10 Int64
217 PWGTP11 Int64
218 PWGTP12 Int64
219 PWGTP13 Int64
220 PWGTP14 Int64
221 PWGTP15 Int64
222 PWGTP16 Int64
223 PWGTP17 Int64
224 PWGTP18 Int64
225 PWGTP19 Int64
226 PWGTP20 Int64
227 PWGTP21 Int64
228 PWGTP22 Int64
229 PWGTP23 Int64
230 PWGTP24 Int64
231 PWGTP25 Int64
232 PWGTP26 Int64
233 PWGTP27 Int64
234 PWGTP28 Int64
235 PWGTP29 Int64
236 PWGTP30 Int64
237 PWGTP31 Int64
238 PWGTP32 Int64
239 PWGTP33 Int64
240 PWGTP34 Int64
241 PWGTP35 Int64
242 PWGTP36 Int64
243 PWGTP37 Int64
244 PWGTP38 Int64
245 PWGTP39 Int64
246 PWGTP40 Int64
247 PWGTP41 Int64
248 PWGTP42 Int64
249 PWGTP43 Int64
250 PWGTP44 Int64
251 PWGTP45 Int64
252 PWGTP46 Int64
253 PWGTP47 Int64
254 PWGTP48 Int64
255 PWGTP49 Int64
256 PWGTP50 Int64
257 PWGTP51 Int64
258 PWGTP52 Int64
259 PWGTP53 Int64
260 PWGTP54 Int64
261 PWGTP55 Int64
262 PWGTP56 Int64
263 PWGTP57 Int64
264 PWGTP58 Int64
265 PWGTP59 Int64
266 PWGTP60 Int64
267 PWGTP61 Int64
268 PWGTP62 Int64
269 PWGTP63 Int64
270 PWGTP64 Int64
271 PWGTP65 Int64
272 PWGTP66 Int64
273 PWGTP67 Int64
274 PWGTP68 Int64
275 PWGTP69 Int64
276 PWGTP70 Int64
277 PWGTP71 Int64
278 PWGTP72 Int64
279 PWGTP73 Int64
280 PWGTP74 Int64
281 PWGTP75 Int64
282 PWGTP76 Int64
283 PWGTP77 Int64
284 PWGTP78 Int64
285 PWGTP79 Int64
286 PWGTP80 Int64
And
using JuliaDB
acs_better = loadtable("psam_pusa.csv", colparsers = vcat(String, repeat([Union{Missing,Int64}], 95), String,
repeat([Union{Missing,Int64}], 30), String, repeat([Union{Missing,Int64}], 158)))
save(acs_better, "test_better")
yields the following and a 2.5GB file:
Table with 4691835 rows, 286 columns:
Columns:
# colname type
─────────────────────────────────────
1 RT String
2 SERIALNO Union{Missing, Int64}
3 DIVISION Union{Missing, Int64}
4 SPORDER Union{Missing, Int64}
5 PUMA Union{Missing, Int64}
6 REGION Union{Missing, Int64}
7 ST Union{Missing, Int64}
8 ADJINC Union{Missing, Int64}
9 PWGTP Union{Missing, Int64}
10 AGEP Union{Missing, Int64}
11 CIT Union{Missing, Int64}
12 CITWP Union{Missing, Int64}
13 COW Union{Missing, Int64}
14 DDRS Union{Missing, Int64}
15 DEAR Union{Missing, Int64}
16 DEYE Union{Missing, Int64}
17 DOUT Union{Missing, Int64}
18 DPHY Union{Missing, Int64}
19 DRAT Union{Missing, Int64}
20 DRATX Union{Missing, Int64}
21 DREM Union{Missing, Int64}
22 ENG Union{Missing, Int64}
23 FER Union{Missing, Int64}
24 GCL Union{Missing, Int64}
25 GCM Union{Missing, Int64}
26 GCR Union{Missing, Int64}
27 HINS1 Union{Missing, Int64}
28 HINS2 Union{Missing, Int64}
29 HINS3 Union{Missing, Int64}
30 HINS4 Union{Missing, Int64}
31 HINS5 Union{Missing, Int64}
32 HINS6 Union{Missing, Int64}
33 HINS7 Union{Missing, Int64}
34 INTP Union{Missing, Int64}
35 JWMNP Union{Missing, Int64}
36 JWRIP Union{Missing, Int64}
37 JWTR Union{Missing, Int64}
38 LANX Union{Missing, Int64}
39 MAR Union{Missing, Int64}
40 MARHD Union{Missing, Int64}
41 MARHM Union{Missing, Int64}
42 MARHT Union{Missing, Int64}
43 MARHW Union{Missing, Int64}
44 MARHYP Union{Missing, Int64}
45 MIG Union{Missing, Int64}
46 MIL Union{Missing, Int64}
47 MLPA Union{Missing, Int64}
48 MLPB Union{Missing, Int64}
49 MLPCD Union{Missing, Int64}
50 MLPE Union{Missing, Int64}
51 MLPFG Union{Missing, Int64}
52 MLPH Union{Missing, Int64}
53 MLPI Union{Missing, Int64}
54 MLPJ Union{Missing, Int64}
55 MLPK Union{Missing, Int64}
56 NWAB Union{Missing, Int64}
57 NWAV Union{Missing, Int64}
58 NWLA Union{Missing, Int64}
59 NWLK Union{Missing, Int64}
60 NWRE Union{Missing, Int64}
61 OIP Union{Missing, Int64}
62 PAP Union{Missing, Int64}
63 RELP Union{Missing, Int64}
64 RETP Union{Missing, Int64}
65 SCH Union{Missing, Int64}
66 SCHG Union{Missing, Int64}
67 SCHL Union{Missing, Int64}
68 SEMP Union{Missing, Int64}
69 SEX Union{Missing, Int64}
70 SSIP Union{Missing, Int64}
71 SSP Union{Missing, Int64}
72 WAGP Union{Missing, Int64}
73 WKHP Union{Missing, Int64}
74 WKL Union{Missing, Int64}
75 WKW Union{Missing, Int64}
76 WRK Union{Missing, Int64}
77 YOEP Union{Missing, Int64}
78 ANC Union{Missing, Int64}
79 ANC1P Union{Missing, Int64}
80 ANC2P Union{Missing, Int64}
81 DECADE Union{Missing, Int64}
82 DIS Union{Missing, Int64}
83 DRIVESP Union{Missing, Int64}
84 ESP Union{Missing, Int64}
85 ESR Union{Missing, Int64}
86 FOD1P Union{Missing, Int64}
87 FOD2P Union{Missing, Int64}
88 HICOV Union{Missing, Int64}
89 HISP Union{Missing, Int64}
90 INDP Union{Missing, Int64}
91 JWAP Union{Missing, Int64}
92 JWDP Union{Missing, Int64}
93 LANP Union{Missing, Int64}
94 MIGPUMA Union{Missing, Int64}
95 MIGSP Union{Missing, Int64}
96 MSP Union{Missing, Int64}
97 NAICSP String
98 NATIVITY Union{Missing, Int64}
99 NOP Union{Missing, Int64}
100 OC Union{Missing, Int64}
101 OCCP Union{Missing, Int64}
102 PAOC Union{Missing, Int64}
103 PERNP Union{Missing, Int64}
104 PINCP Union{Missing, Int64}
105 POBP Union{Missing, Int64}
106 POVPIP Union{Missing, Int64}
107 POWPUMA Union{Missing, Int64}
108 POWSP Union{Missing, Int64}
109 PRIVCOV Union{Missing, Int64}
110 PUBCOV Union{Missing, Int64}
111 QTRBIR Union{Missing, Int64}
112 RAC1P Union{Missing, Int64}
113 RAC2P Union{Missing, Int64}
114 RAC3P Union{Missing, Int64}
115 RACAIAN Union{Missing, Int64}
116 RACASN Union{Missing, Int64}
117 RACBLK Union{Missing, Int64}
118 RACNH Union{Missing, Int64}
119 RACNUM Union{Missing, Int64}
120 RACPI Union{Missing, Int64}
121 RACSOR Union{Missing, Int64}
122 RACWHT Union{Missing, Int64}
123 RC Union{Missing, Int64}
124 SCIENGP Union{Missing, Int64}
125 SCIENGRLP Union{Missing, Int64}
126 SFN Union{Missing, Int64}
127 SFR Union{Missing, Int64}
128 SOCP String
129 VPS Union{Missing, Int64}
130 WAOB Union{Missing, Int64}
131 FAGEP Union{Missing, Int64}
132 FANCP Union{Missing, Int64}
133 FCITP Union{Missing, Int64}
134 FCITWP Union{Missing, Int64}
135 FCOWP Union{Missing, Int64}
136 FDDRSP Union{Missing, Int64}
137 FDEARP Union{Missing, Int64}
138 FDEYEP Union{Missing, Int64}
139 FDISP Union{Missing, Int64}
140 FDOUTP Union{Missing, Int64}
141 FDPHYP Union{Missing, Int64}
142 FDRATP Union{Missing, Int64}
143 FDRATXP Union{Missing, Int64}
144 FDREMP Union{Missing, Int64}
145 FENGP Union{Missing, Int64}
146 FESRP Union{Missing, Int64}
147 FFERP Union{Missing, Int64}
148 FFODP Union{Missing, Int64}
149 FGCLP Union{Missing, Int64}
150 FGCMP Union{Missing, Int64}
151 FGCRP Union{Missing, Int64}
152 FHICOVP Union{Missing, Int64}
153 FHINS1P Union{Missing, Int64}
154 FHINS2P Union{Missing, Int64}
155 FHINS3C Union{Missing, Int64}
156 FHINS3P Union{Missing, Int64}
157 FHINS4C Union{Missing, Int64}
158 FHINS4P Union{Missing, Int64}
159 FHINS5C Union{Missing, Int64}
160 FHINS5P Union{Missing, Int64}
161 FHINS6P Union{Missing, Int64}
162 FHINS7P Union{Missing, Int64}
163 FHISP Union{Missing, Int64}
164 FINDP Union{Missing, Int64}
165 FINTP Union{Missing, Int64}
166 FJWDP Union{Missing, Int64}
167 FJWMNP Union{Missing, Int64}
168 FJWRIP Union{Missing, Int64}
169 FJWTRP Union{Missing, Int64}
170 FLANP Union{Missing, Int64}
171 FLANXP Union{Missing, Int64}
172 FMARP Union{Missing, Int64}
173 FMARHDP Union{Missing, Int64}
174 FMARHMP Union{Missing, Int64}
175 FMARHTP Union{Missing, Int64}
176 FMARHWP Union{Missing, Int64}
177 FMARHYP Union{Missing, Int64}
178 FMIGP Union{Missing, Int64}
179 FMIGSP Union{Missing, Int64}
180 FMILPP Union{Missing, Int64}
181 FMILSP Union{Missing, Int64}
182 FOCCP Union{Missing, Int64}
183 FOIP Union{Missing, Int64}
184 FPAP Union{Missing, Int64}
185 FPERNP Union{Missing, Int64}
186 FPINCP Union{Missing, Int64}
187 FPOBP Union{Missing, Int64}
188 FPOWSP Union{Missing, Int64}
189 FPRIVCOVP Union{Missing, Int64}
190 FPUBCOVP Union{Missing, Int64}
191 FRACP Union{Missing, Int64}
192 FRELP Union{Missing, Int64}
193 FRETP Union{Missing, Int64}
194 FSCHGP Union{Missing, Int64}
195 FSCHLP Union{Missing, Int64}
196 FSCHP Union{Missing, Int64}
197 FSEMP Union{Missing, Int64}
198 FSEXP Union{Missing, Int64}
199 FSSIP Union{Missing, Int64}
200 FSSP Union{Missing, Int64}
201 FWAGP Union{Missing, Int64}
202 FWKHP Union{Missing, Int64}
203 FWKLP Union{Missing, Int64}
204 FWKWP Union{Missing, Int64}
205 FWRKP Union{Missing, Int64}
206 FYOEP Union{Missing, Int64}
207 PWGTP1 Union{Missing, Int64}
208 PWGTP2 Union{Missing, Int64}
209 PWGTP3 Union{Missing, Int64}
210 PWGTP4 Union{Missing, Int64}
211 PWGTP5 Union{Missing, Int64}
212 PWGTP6 Union{Missing, Int64}
213 PWGTP7 Union{Missing, Int64}
214 PWGTP8 Union{Missing, Int64}
215 PWGTP9 Union{Missing, Int64}
216 PWGTP10 Union{Missing, Int64}
217 PWGTP11 Union{Missing, Int64}
218 PWGTP12 Union{Missing, Int64}
219 PWGTP13 Union{Missing, Int64}
220 PWGTP14 Union{Missing, Int64}
221 PWGTP15 Union{Missing, Int64}
222 PWGTP16 Union{Missing, Int64}
223 PWGTP17 Union{Missing, Int64}
224 PWGTP18 Union{Missing, Int64}
225 PWGTP19 Union{Missing, Int64}
226 PWGTP20 Union{Missing, Int64}
227 PWGTP21 Union{Missing, Int64}
228 PWGTP22 Union{Missing, Int64}
229 PWGTP23 Union{Missing, Int64}
230 PWGTP24 Union{Missing, Int64}
231 PWGTP25 Union{Missing, Int64}
232 PWGTP26 Union{Missing, Int64}
233 PWGTP27 Union{Missing, Int64}
234 PWGTP28 Union{Missing, Int64}
235 PWGTP29 Union{Missing, Int64}
236 PWGTP30 Union{Missing, Int64}
237 PWGTP31 Union{Missing, Int64}
238 PWGTP32 Union{Missing, Int64}
239 PWGTP33 Union{Missing, Int64}
240 PWGTP34 Union{Missing, Int64}
241 PWGTP35 Union{Missing, Int64}
242 PWGTP36 Union{Missing, Int64}
243 PWGTP37 Union{Missing, Int64}
244 PWGTP38 Union{Missing, Int64}
245 PWGTP39 Union{Missing, Int64}
246 PWGTP40 Union{Missing, Int64}
247 PWGTP41 Union{Missing, Int64}
248 PWGTP42 Union{Missing, Int64}
249 PWGTP43 Union{Missing, Int64}
250 PWGTP44 Union{Missing, Int64}
251 PWGTP45 Union{Missing, Int64}
252 PWGTP46 Union{Missing, Int64}
253 PWGTP47 Union{Missing, Int64}
254 PWGTP48 Union{Missing, Int64}
255 PWGTP49 Union{Missing, Int64}
256 PWGTP50 Union{Missing, Int64}
257 PWGTP51 Union{Missing, Int64}
258 PWGTP52 Union{Missing, Int64}
259 PWGTP53 Union{Missing, Int64}
260 PWGTP54 Union{Missing, Int64}
261 PWGTP55 Union{Missing, Int64}
262 PWGTP56 Union{Missing, Int64}
263 PWGTP57 Union{Missing, Int64}
264 PWGTP58 Union{Missing, Int64}
265 PWGTP59 Union{Missing, Int64}
266 PWGTP60 Union{Missing, Int64}
267 PWGTP61 Union{Missing, Int64}
268 PWGTP62 Union{Missing, Int64}
269 PWGTP63 Union{Missing, Int64}
270 PWGTP64 Union{Missing, Int64}
271 PWGTP65 Union{Missing, Int64}
272 PWGTP66 Union{Missing, Int64}
273 PWGTP67 Union{Missing, Int64}
274 PWGTP68 Union{Missing, Int64}
275 PWGTP69 Union{Missing, Int64}
276 PWGTP70 Union{Missing, Int64}
277 PWGTP71 Union{Missing, Int64}
278 PWGTP72 Union{Missing, Int64}
279 PWGTP73 Union{Missing, Int64}
280 PWGTP74 Union{Missing, Int64}
281 PWGTP75 Union{Missing, Int64}
282 PWGTP76 Union{Missing, Int64}
283 PWGTP77 Union{Missing, Int64}
284 PWGTP78 Union{Missing, Int64}
285 PWGTP79 Union{Missing, Int64}
286 PWGTP80 Union{Missing, Int64}
Could you run this on both, where you replace t
with acs
and acsbetter
respectively?
map(i->typeof(i), values(getfield(t.columns, :fieldarrays)))
I'm interested in the array types of each column.
With acs
:
(WeakRefStrings.StringArray{String,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Union{Missing, Int64},1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1}, Array{Int64,1})
And with acs_better
:
(WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, WeakRefStrings.StringArray{String,1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1}, Array{Union{Missing, Int64},1})
So, I'm out of ideas what is happening here :) The auto-detected column type cases has fewer Union{Int,Missing}
columns, but I assume that those columns actually don't have missing data, so I don't understand how the case with more Union{Int,Missing}
columns could end up with a smaller disc footprint. I think @shashi needs to look into this :)
One more thing: I think most likely the problem here has nothing to do with TextParse, at least right now I don’t see how it could. This to me right now looks more like an issue with the JuliaDB save functionality.
Thanks. I opened up an issue with JuliaDB.
This post documents a problem with a very basic use case of JuliaDB.
yields an 8GB file, although
psam_pusa.csv
is only 4GB. The infered types are twoString
s, manyInt64
s and manyUnion{Missing,Int64}
s.yields a 2.5GB file.
Does the column type inference work properly? Or is it a storage problem of JuliaDB. I am on Julia 1.1.0, TextParse 0.9.1+, and JuliaDB 0.12