lux-org / lux

Automatically visualize your pandas dataframe via a single print! 📊 💡
Apache License 2.0
5.14k stars 364 forks source link

Better visualization for high cardinality bar charts #28

Closed dorisjlee closed 3 years ago

dorisjlee commented 4 years ago

Currently, our bar chart visualization tries to balance showing the overall distribution and min/max over individual values, the issue is that the charts are summarized or squashed.

✅ easily distinguish min & max items ❌ charts summarized/squashed

image

Other alternatives include 1) overplotting, 2) truncating to top-k, and 3) scrollable design. Each of them come with their own pros/cons.

One possible design is having a truncated top-k view that can be "expanded" into a scrollable view.

dorisjlee commented 3 years ago

After discussion with the dev team, the design that we settled on is a top-k (where k is around 10) plus a text showing + X more... to indicate the total number of distinct values. (Advanced: It might be also useful to consider how we can enable interactions that reverse the sort order to look at the lowest-k bars.)

import altair as alt
visData = pd.DataFrame({'Name': {0: 'amc ambassador brougham', 1: 'amc ambassador dpl', 2: 'amc ambassador sst', 3: 'amc concord', 4: 'amc concord d/l', 5: 'amc concord dl 6', 6: 'amc gremlin', 7: 'amc hornet', 8: 'amc hornet sportabout (sw)', 9: 'amc matador', 10: 'amc matador (sw)', 11: 'amc pacer', 12: 'amc pacer d/l', 13: 'amc rebel sst', 14: 'amc spirit dl', 15: 'audi 100 ls', 16: 'audi 100ls', 17: 'audi 4000', 18: 'audi 5000', 19: 'audi 5000s (diesel)', 20: 'audi fox', 21: 'bmw 2002', 22: 'bmw 320i', 23: 'buick century', 24: 'buick century 350', 25: 'buick century limited', 26: 'buick century luxus (sw)', 27: 'buick century special', 28: 'buick electra 225 custom', 29: 'buick estate wagon (sw)', 30: 'buick lesabre custom', 31: 'buick opel isuzu deluxe', 32: 'buick regal sport coupe (turbo)', 33: 'buick skyhawk', 34: 'buick skylark', 35: 'buick skylark 320', 36: 'buick skylark limited', 37: 'cadillac eldorado', 38: 'cadillac seville', 39: 'capri ii', 40: 'chevroelt chevelle malibu', 41: 'chevrolet bel air', 42: 'chevrolet camaro', 43: 'chevrolet caprice classic', 44: 'chevrolet cavalier', 45: 'chevrolet cavalier 2-door', 46: 'chevrolet cavalier wagon', 47: 'chevrolet chevelle concours (sw)', 48: 'chevrolet chevelle malibu', 49: 'chevrolet chevelle malibu classic', 50: 'chevrolet chevette', 51: 'chevrolet citation', 52: 'chevrolet concours', 53: 'chevrolet impala', 54: 'chevrolet malibu', 55: 'chevrolet malibu classic (sw)', 56: 'chevrolet monte carlo', 57: 'chevrolet monte carlo landau', 58: 'chevrolet monte carlo s', 59: 'chevrolet monza 2+2', 60: 'chevrolet nova', 61: 'chevrolet nova custom', 62: 'chevrolet vega', 63: 'chevrolet vega (sw)', 64: 'chevrolet vega 2300', 65: 'chevrolet woody', 66: 'chevy c10', 67: 'chevy c20', 68: 'chevy s-10', 69: 'chrysler cordoba', 70: 'chrysler lebaron medallion', 71: 'chrysler lebaron salon', 72: 'chrysler lebaron town @ country (sw)', 73: 'chrysler new yorker brougham', 74: 'chrysler newport royal', 75: 'datsun 1200', 76: 'datsun 200-sx', 77: 'datsun 200sx', 78: 'datsun 210', 79: 'datsun 280-zx', 80: 'datsun 310', 81: 'datsun 310 gx', 82: 'datsun 510', 83: 'datsun 510 (sw)', 84: 'datsun 510 hatchback', 85: 'datsun 610', 86: 'datsun 710', 87: 'datsun 810', 88: 'datsun 810 maxima', 89: 'datsun b-210', 90: 'datsun b210', 91: 'datsun b210 gx', 92: 'datsun f-10 hatchback', 93: 'datsun pl510', 94: 'dodge aries se', 95: 'dodge aries wagon (sw)', 96: 'dodge aspen', 97: 'dodge aspen 6', 98: 'dodge aspen se', 99: 'dodge challenger se', 100: 'dodge charger 2.2', 101: 'dodge colt', 102: 'dodge colt (sw)', 103: 'dodge colt hardtop', 104: 'dodge colt hatchback custom', 105: 'dodge colt m/m', 106: 'dodge coronet brougham', 107: 'dodge coronet custom', 108: 'dodge coronet custom (sw)', 109: 'dodge d100', 110: 'dodge d200', 111: 'dodge dart custom', 112: 'dodge diplomat', 113: 'dodge magnum xe', 114: 'dodge monaco (sw)', 115: 'dodge monaco brougham', 116: 'dodge omni', 117: 'dodge rampage', 118: 'dodge st. regis', 119: 'fiat 124 sport coupe', 120: 'fiat 124 tc', 121: 'fiat 124b', 122: 'fiat 128', 123: 'fiat 131', 124: 'fiat strada custom', 125: 'fiat x1.9', 126: 'ford country', 127: 'ford country squire (sw)', 128: 'ford escort 2h', 129: 'ford escort 4w', 130: 'ford f108', 131: 'ford f250', 132: 'ford fairmont', 133: 'ford fairmont (auto)', 134: 'ford fairmont (man)', 135: 'ford fairmont 4', 136: 'ford fairmont futura', 137: 'ford fiesta', 138: 'ford futura', 139: 'ford galaxie 500', 140: 'ford gran torino', 141: 'ford gran torino (sw)', 142: 'ford granada', 143: 'ford granada ghia', 144: 'ford granada gl', 145: 'ford granada l', 146: 'ford ltd', 147: 'ford ltd landau', 148: 'ford maverick', 149: 'ford mustang', 150: 'ford mustang gl', 151: 'ford mustang ii', 152: 'ford mustang ii 2+2', 153: 'ford pinto', 154: 'ford pinto (sw)', 155: 'ford pinto runabout', 156: 'ford ranger', 157: 'ford thunderbird', 158: 'ford torino', 159: 'ford torino 500', 160: 'hi 1200d', 161: 'honda Accelerationord', 162: 'honda Accelerationord cvcc', 163: 'honda Accelerationord lx', 164: 'honda civic', 165: 'honda civic (auto)', 166: 'honda civic 1300', 167: 'honda civic 1500 gl', 168: 'honda civic cvcc', 169: 'honda prelude', 170: 'maxda glc deluxe', 171: 'maxda rx3', 172: 'mazda 626', 173: 'mazda glc', 174: 'mazda glc 4', 175: 'mazda glc custom', 176: 'mazda glc custom l', 177: 'mazda glc deluxe', 178: 'mazda rx-4', 179: 'mazda rx-7 gs', 180: 'mazda rx2 coupe', 181: 'mercedes benz 300d', 182: 'mercedes-benz 240d', 183: 'mercedes-benz 280s', 184: 'mercury capri 2000', 185: 'mercury capri v6', 186: 'mercury cougar brougham', 187: 'mercury grand marquis', 188: 'mercury lynx l', 189: 'mercury marquis', 190: 'mercury marquis brougham', 191: 'mercury monarch', 192: 'mercury monarch ghia', 193: 'mercury zephyr', 194: 'mercury zephyr 6', 195: 'nissan stanza xe', 196: 'oldsmobile cutlass ciera (diesel)', 197: 'oldsmobile cutlass ls', 198: 'oldsmobile cutlass salon brougham', 199: 'oldsmobile cutlass supreme', 200: 'oldsmobile delta 88 royale', 201: 'oldsmobile omega', 202: 'oldsmobile omega brougham', 203: 'oldsmobile starfire sx', 204: 'oldsmobile vista cruiser', 205: 'opel 1900', 206: 'opel manta', 207: 'peugeot 304', 208: 'peugeot 504', 209: 'peugeot 504 (sw)', 210: 'peugeot 505s turbo diesel', 211: 'peugeot 604sl', 212: "plymouth 'cuda 340", 213: 'plymouth arrow gs', 214: 'plymouth champ', 215: 'plymouth cricket', 216: 'plymouth custom suburb', 217: 'plymouth duster', 218: 'plymouth fury', 219: 'plymouth fury gran sedan', 220: 'plymouth fury iii', 221: 'plymouth grand fury', 222: 'plymouth horizon', 223: 'plymouth horizon 4', 224: 'plymouth horizon miser', 225: 'plymouth horizon tc3', 226: 'plymouth reliant', 227: 'plymouth sapporo', 228: 'plymouth satellite', 229: 'plymouth satellite custom', 230: 'plymouth satellite custom (sw)', 231: 'plymouth satellite sebring', 232: 'plymouth valiant', 233: 'plymouth valiant custom', 234: 'plymouth volare', 235: 'plymouth volare custom', 236: 'plymouth volare premier v8', 237: 'pontiac astro', 238: 'pontiac catalina', 239: 'pontiac catalina brougham', 240: 'pontiac firebird', 241: 'pontiac grand prix', 242: 'pontiac grand prix lj', 243: 'pontiac j2000 se hatchback', 244: 'pontiac lemans v6', 245: 'pontiac phoenix', 246: 'pontiac phoenix lj', 247: 'pontiac safari (sw)', 248: 'pontiac sunbird coupe', 249: 'pontiac ventura sj', 250: 'renault 12 (sw)', 251: 'renault 12tl', 252: 'renault 5 gtl', 253: 'saab 99e', 254: 'saab 99gle', 255: 'saab 99le', 256: 'subaru', 257: 'subaru dl', 258: 'toyota carina', 259: 'toyota celica gt', 260: 'toyota celica gt liftback', 261: 'toyota corolla', 262: 'toyota corolla 1200', 263: 'toyota corolla 1600 (sw)', 264: 'toyota corolla liftback', 265: 'toyota corolla tercel', 266: 'toyota corona', 267: 'toyota corona hardtop', 268: 'toyota corona liftback', 269: 'toyota corona mark ii', 270: 'toyota cressida', 271: 'toyota mark ii', 272: 'toyota starlet', 273: 'toyota tercel', 274: 'toyouta corona mark ii (sw)', 275: 'triumph tr7 coupe', 276: 'vokswagen rabbit', 277: 'volkswagen 1131 deluxe sedan', 278: 'volkswagen 411 (sw)', 279: 'volkswagen dasher', 280: 'volkswagen jetta', 281: 'volkswagen model 111', 282: 'volkswagen rabbit', 283: 'volkswagen rabbit custom', 284: 'volkswagen rabbit custom diesel', 285: 'volkswagen rabbit l', 286: 'volkswagen scirocco', 287: 'volkswagen super beetle', 288: 'volkswagen type 3', 289: 'volvo 144ea', 290: 'volvo 145e (sw)', 291: 'volvo 244dl', 292: 'volvo 245', 293: 'volvo 264gl', 294: 'volvo diesel', 295: 'vw dasher (diesel)', 296: 'vw pickup', 297: 'vw rabbit', 298: 'vw rabbit c (diesel)', 299: 'vw rabbit custom'}, 'Record': {0: 1, 1: 1, 2: 1, 3: 2, 4: 1, 5: 1, 6: 4, 7: 4, 8: 1, 9: 5, 10: 2, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 2, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 2, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 29: 2, 30: 1, 31: 1, 32: 1, 33: 1, 34: 2, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 3, 44: 1, 45: 1, 46: 1, 47: 1, 48: 2, 49: 2, 50: 4, 51: 3, 52: 1, 53: 4, 54: 2, 55: 1, 56: 1, 57: 2, 58: 1, 59: 1, 60: 3, 61: 1, 62: 3, 63: 1, 64: 1, 65: 1, 66: 1, 67: 1, 68: 1, 69: 1, 70: 1, 71: 1, 72: 1, 73: 1, 74: 1, 75: 1, 76: 1, 77: 1, 78: 3, 79: 1, 80: 1, 81: 1, 82: 1, 83: 1, 84: 1, 85: 1, 86: 2, 87: 1, 88: 1, 89: 1, 90: 1, 91: 1, 92: 1, 93: 2, 94: 1, 95: 1, 96: 2, 97: 1, 98: 1, 99: 1, 100: 1, 101: 3, 102: 1, 103: 1, 104: 1, 105: 1, 106: 1, 107: 1, 108: 1, 109: 1, 110: 1, 111: 1, 112: 1, 113: 1, 114: 1, 115: 1, 116: 1, 117: 1, 118: 1, 119: 1, 120: 1, 121: 1, 122: 2, 123: 1, 124: 1, 125: 1, 126: 1, 127: 2, 128: 1, 129: 1, 130: 1, 131: 1, 132: 1, 133: 1, 134: 1, 135: 1, 136: 1, 137: 1, 138: 1, 139: 3, 140: 3, 141: 2, 142: 1, 143: 1, 144: 1, 145: 1, 146: 2, 147: 1, 148: 4, 149: 1, 150: 1, 151: 1, 152: 1, 153: 5, 154: 1, 155: 1, 156: 1, 157: 1, 158: 1, 159: 1, 160: 1, 161: 2, 162: 1, 163: 1, 164: 3, 165: 1, 166: 1, 167: 1, 168: 2, 169: 1, 170: 1, 171: 1, 172: 2, 173: 1, 174: 1, 175: 1, 176: 1, 177: 1, 178: 1, 179: 1, 180: 1, 181: 1, 182: 1, 183: 1, 184: 1, 185: 1, 186: 1, 187: 1, 188: 1, 189: 1, 190: 1, 191: 1, 192: 1, 193: 1, 194: 1, 195: 1, 196: 1, 197: 1, 198: 2, 199: 1, 200: 1, 201: 1, 202: 1, 203: 1, 204: 1, 205: 2, 206: 2, 207: 1, 208: 4, 209: 1, 210: 1, 211: 1, 212: 1, 213: 1, 214: 1, 215: 1, 216: 1, 217: 3, 218: 1, 219: 1, 220: 3, 221: 1, 222: 1, 223: 1, 224: 1, 225: 1, 226: 2, 227: 1, 228: 1, 229: 1, 230: 1, 231: 1, 232: 2, 233: 1, 234: 1, 235: 1, 236: 1, 237: 1, 238: 3, 239: 1, 240: 1, 241: 1, 242: 1, 243: 1, 244: 1, 245: 2, 246: 1, 247: 1, 248: 1, 249: 1, 250: 1, 251: 1, 252: 1, 253: 1, 254: 1, 255: 2, 256: 2, 257: 2, 258: 1, 259: 1, 260: 1, 261: 5, 262: 2, 263: 1, 264: 1, 265: 1, 266: 4, 267: 1, 268: 1, 269: 1, 270: 1, 271: 2, 272: 1, 273: 1, 274: 1, 275: 1, 276: 1, 277: 1, 278: 1, 279: 3, 280: 1, 281: 1, 282: 2, 283: 1, 284: 1, 285: 1, 286: 1, 287: 1, 288: 1, 289: 1, 290: 1, 291: 1, 292: 1, 293: 1, 294: 1, 295: 1, 296: 1, 297: 2, 298: 1, 299: 1}})
######### Added Top-k
measure = "Record"
k=10
remaining_bars = len(visData)-10
visData = visData.nlargest(k,measure)
######### END
chart = alt.Chart(visData).mark_bar().encode(
    y = alt.Y('Name', type= 'nominal', axis=alt.Axis(labelOverlap=True), sort ='-x'),
    x = alt.X('Record', type= 'quantitative', title='Number of Records'),
)
######### Added Text & Title
text = alt.Chart(visData).mark_text(
    x=5, 
    y=142,
    align="left",
    color = "#e3e3e3",
    fontSize = 11,
    fontWeight="lighter",
    text=f"+ {remaining_bars} more ..."
)
chart = chart+text
######### END
chart = chart.configure_mark(tooltip=alt.TooltipContent('encoding')) # Setting tooltip as non-null

chart = chart.configure_title(fontWeight=500,fontSize=13,font='Helvetica Neue')
chart = chart.configure_axis(titleFontWeight=500,titleFontSize=11,titleFont='Helvetica Neue',
            labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue',labelColor='#505050')
chart = chart.configure_legend(titleFontWeight=500,titleFontSize=10,titleFont='Helvetica Neue',
            labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue')
chart = chart.properties(width=160,height=150)

chart

image

dorisjlee commented 3 years ago

Added top-k visualization : image image

One potential issue for the future is that the dataframe of the exported code includes only the top-k rows, we might want to include the full dataframe in the future, in case users want to plot the additional rows in the export.