Closed justinmeiners closed 2 years ago
Thanks for submitting a PR. Please use make lint
to check for lint errors (requires SwiftLint)
@jberkel fixed.
Wouldn't it be simpler to just change line 1172 to
let similar = columnNames.keys.filter { $0.hasSuffix(".\(column.template)") }
This avoids the allocation of the array with all columns. In the most common case (empty array) no allocation will be made.
@jberkel
the most common case (empty array) no allocation will be made.
I am less certain about this, but I believe the most common case is 1
. The user asks for a column and it happens to be prefixed. A 0
means it couldn't find the unprefixed or prefixed version of what the user requested (they asked for the wrong thing).
filter
Filter on dictionary keys creates an array. We could use the lazy
variant, assuming this particular kind of sequence allows traversal multiple times.
If the lazy
version worked here is what would happen for the case of 1
:
With the proposed changes we only apply the predicate to the entire list once, and we don't have to lookup the entry in the dictionary afterwards.
Now, this kind of analysis is a definitely a little overboard for most code, but I think this fix is pretty straightforward. get
is the most common operation and will be called a huge number of times in all kinds of inner loops.
Yes, it will create an array, but one with just one element in non-error cases. I'm not sure if your changes really have a noticeable benefit. The array is allocated on the stack, so this is a cheap operation, and I doubt that this is slower than two separate indexOf
operations. Have you done any profiling?
@jberkel If I demonstrate it is faster, will you include it? Or do you still have other concerns?
If it's faster, yes. A straightforward change would be to avoid the allocation of an array containing all the column names. (Array(columnNames.keys).filter
→ columnNames.keys.filter
). This should then be compared to your version.
@jberkel Just did a test where I pull a single column from about 50,000 rows. I measure the length of time using sign posts in instruments:
os.os_signpost(.begin, log: perfLog, name: "parse rows")
defer {
os.os_signpost(.end, log: perfLog, name: "parse rows")
}
return rows.map({ row in
return (row[value], row[value])
})
The runtimes for a few trials were the following:
Filter keys
Finding index
Explanation
columnNames.keys.filter
has to allocate an array to store the results. Finding the first two indicies forgoes this need.
nice, that looks like a substantial improvement. How many columns are in your test data? There's not just the allocation, but also the iteration over all elements.
@jberkel that's a good question. I just had 4 columns in this query result. I expect both algorithms to scale the same in the number of columns. Both visit each column exactly once, in order. Both apply a predicate on each visit.
You're right, that leaves the allocation overhead. Maybe the stack allocation isn't as fast they claim it is. Thanks for investigating!
For issue #1109