Closed kishoreg closed 5 years ago
Merging #9 into develop will decrease coverage by
0.05%
. The diff coverage is66.66%
.
@@ Coverage Diff @@
## develop #9 +/- ##
=============================================
- Coverage 67.21% 67.15% -0.06%
Complexity 4 4
=============================================
Files 1030 1030
Lines 50860 50780 -80
Branches 7111 7100 -11
=============================================
- Hits 34184 34103 -81
- Misses 14343 14353 +10
+ Partials 2333 2324 -9
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
.../core/indexsegment/mutable/MutableSegmentImpl.java | 81.33% <66.66%> (+15.8%) |
0 <0> (ø) |
:arrow_down: |
...apache/pinot/common/metrics/ValidationMetrics.java | 20.28% <0%> (-59.43%) |
0% <0%> (ø) |
|
...a/manager/realtime/RealtimeSegmentDataManager.java | 50% <0%> (-50%) |
0% <0%> (ø) |
|
...ller/validation/OfflineSegmentIntervalChecker.java | 26.74% <0%> (-50%) |
0% <0%> (ø) |
|
...lix/EmptyBrokerOnlineOfflineStateModelFactory.java | 86.66% <0%> (-13.34%) |
0% <0%> (ø) |
|
.../impl/dictionary/FloatOnHeapMutableDictionary.java | 86.66% <0%> (-6.67%) |
0% <0%> (ø) |
|
.../impl/dictionary/LongOffHeapMutableDictionary.java | 87.27% <0%> (-5.46%) |
0% <0%> (ø) |
|
...org/apache/pinot/common/metrics/MetricsHelper.java | 72.5% <0%> (-2.5%) |
0% <0%> (ø) |
|
.../controller/helix/core/SegmentDeletionManager.java | 79.67% <0%> (-2.44%) |
0% <0%> (ø) |
|
...pinot/server/starter/helix/HelixServerStarter.java | 52.76% <0%> (-2.22%) |
0% <0%> (ø) |
|
... and 18 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d08d26d...67c9277. Read the comment docs.
This can definitely reduce the memory usage, but will also increase the time spent on fetching values (doing O(nlogn) fetches vs O(n) before). Caching values on heap might cause problem for huge dictionaries, so still this change should be a good trade-off.
This can definitely reduce the memory usage, but will also increase the time spent on fetching values (doing O(nlogn) fetches vs O(n) before). Caching values on heap might cause problem for huge dictionaries, so still this change should be a good trade-off.
Good point. However, note that sorting is common to both implementations. so the number of fetches are the same. It's just that in one case fetch is looking up an array (which is created by looking up the dictionary) and in another case, the lookup is on the dictionary. Given that this is for the consuming segment, the dictionary is mostly memory resident.
This can definitely reduce the memory usage, but will also increase the time spent on fetching values (doing O(nlogn) fetches vs O(n) before). Caching values on heap might cause problem for huge dictionaries, so still this change should be a good trade-off.
Good point. However, note that sorting is common to both implementations. so the number of fetches are the same. It's just that in one case fetch is looking up an array (which is created by looking up the dictionary) and in another case, the lookup is on the dictionary. Given that this is for the consuming segment, the dictionary is mostly memory resident.
For String and byte[], fetching from dictionary has extra cost to construct the Object (for String, read the bytes and decode the bytes, for byte[] copy the bytes into an array). IMO it's more like CPU bound instead of IO bound
We can optimize those things in another PR. We could potentially compare the byte[] without converting or copying. But as you said, the trade-off makes sense.
We can optimize those things in another PR. We could potentially compare the byte[] without converting or copying. But as you said, the trade-off makes sense.
In the master PR, the comparison is handled inside MutableDictionary, which will make the future optimization much simpler as different MutableDictionary (on-heap/off-heap) should handle them differently.
…ilet plate code
Sorting is creating too many objects unnecessarily. We just need two int arrays - int[size of dict] int[num docs]
today we create