pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.58k stars 1.89k forks source link

Filtering with pl.col is substantially (27x) slower than filtering with pl.Series #18833

Open AltamashRafiq opened 2 weeks ago

AltamashRafiq commented 2 weeks ago

Checks

Reproducible example

I have been unable to reproduce this example without using my custom data. Below is a dataset that looks like my existing data (same typings) but it not the same. I have been unable to replicate the issue with this fake data.

import polars as pl

schema = pl.Schema([('col_0', pl.String),
        ('col_1', pl.Int64),
        ('col_2', pl.Int64),
        ('col_3', pl.Int64),
        ('col_4', pl.Int64),
        ('col_5', pl.Int64),
        ('col_6', pl.Int64),
        ('col_7', pl.Int64),
        ('col_8', pl.Int64),
        ('col_9', pl.Int64),
        ('col_10', pl.Float64),
        ('col_11', pl.Float64),
        ('col_12', pl.Int64),
        ('col_13', pl.Int64),
        ('col_14', pl.Int64),
        ('col_15', pl.Int64),
        ('col_16', pl.Int64),
        ('col_17', pl.Float64),
        ('col_18', pl.Float64),
        ('col_19', pl.Int64),
        ('col_20', pl.Int64),
        ('col_21', pl.Int64),
        ('col_22', pl.Int64),
        ('col_23', pl.Int64),
        ('col_24', pl.Float64),
        ('col_25', pl.Int64),
        ('col_26', pl.Float64),
        ('col_27', pl.Int64),
        ('col_28', pl.Int64),
        ('col_29', pl.Int64),
        ('col_30', pl.Int64),
        ('col_31', pl.Int64),
        ('col_32', pl.Float64),
        ('col_33', pl.Float64),
        ('col_34', pl.Float64),
        ('col_35', pl.Float64),
        ('col_36', pl.Int64),
        ('col_37', pl.Int64),
        ('col_38', pl.Int64),
        ('col_39', pl.Int64),
        ('col_40', pl.Float64),
        ('col_41', pl.Int64),
        ('col_42', pl.Float64),
        ('col_43', pl.Float64),
        ('col_44', pl.Int64),
        ('col_45', pl.Int64),
        ('col_46', pl.Float64),
        ('col_47', pl.Int64),
        ('col_48', pl.Int64),
        ('col_49', pl.Int64),
        ('col_50', pl.Int64),
        ('col_51', pl.Int64),
        ('col_52', pl.Int64),
        ('col_53', pl.Int64),
        ('col_54', pl.Int64),
        ('col_55', pl.Int64),
        ('col_56', pl.Int64),
        ('col_57', pl.Float64),
        ('col_58', pl.Int64),
        ('col_59', pl.Float64),
        ('col_60', pl.Int64),
        ('col_61', pl.Int64),
        ('col_62', pl.Float64),
        ('col_63', pl.Int64),
        ('col_64', pl.Float64),
        ('col_65', pl.Int64),
        ('col_66', pl.Float64),
        ('col_67', pl.Int64),
        ('col_68', pl.Float64),
        ('col_69', pl.Int64),
        ('col_70', pl.Int64),
        ('col_71', pl.Int64),
        ('col_72', pl.Int64),
        ('col_73', pl.Int64),
        ('col_74', pl.Int64),
        ('col_75', pl.Int64),
        ('col_76', pl.Float64),
        ('col_77', pl.Int64),
        ('col_78', pl.Int64),
        ('col_79', pl.Float64),
        ('col_80', pl.Int64),
        ('col_81', pl.Float64),
        ('col_82', pl.Float64),
        ('col_83', pl.Int64),
        ('col_84', pl.Int64),
        ('col_85', pl.Int64),
        ('col_86', pl.Int64),
        ('col_87', pl.Int64),
        ('col_88', pl.Int64),
        ('col_89', pl.Int64),
        ('col_90', pl.Int64),
        ('col_91', pl.Int64),
        ('col_92', pl.Int64),
        ('col_93', pl.Float64),
        ('col_94', pl.Int64),
        ('col_95', pl.Int64),
        ('col_96', pl.Int64),
        ('col_97', pl.Int64),
        ('col_98', pl.Int64),
        ('col_99', pl.Int64),
        ('col_100', pl.Int64),
        ('col_101', pl.Int64),
        ('col_102', pl.Int64),
        ('col_103', pl.Int64),
        ('col_104', pl.Int64),
        ('col_105', pl.Int64),
        ('col_106', pl.Int64),
        ('col_107', pl.Int64),
        ('col_108', pl.Float64),
        ('col_109', pl.Float64),
        ('col_110', pl.Float64),
        ('col_111', pl.Float64),
        ('col_112', pl.Int64),
        ('col_113', pl.Int64),
        ('col_114', pl.Int64),
        ('col_115', pl.Int64),
        ('col_116', pl.Int64),
        ('col_117', pl.Int64),
        ('col_118', pl.Int64),
        ('col_119', pl.Int64),
        ('col_120', pl.Int64),
        ('col_121', pl.Float64),
        ('col_122', pl.Float64),
        ('col_123', pl.Float64),
        ('col_124', pl.Float64),
        ('col_125', pl.Int64),
        ('col_126', pl.Int64),
        ('col_127', pl.Int64),
        ('col_128', pl.Int64),
        ('col_129', pl.Float64),
        ('col_130', pl.Float64),
        ('col_131', pl.Int64),
        ('col_132', pl.Float64),
        ('col_133', pl.Float64),
        ('col_134', pl.Float64),
        ('col_135', pl.Float64),
        ('col_136', pl.Float64),
        ('col_137', pl.Int64),
        ('col_138', pl.Float64),
        ('col_139', pl.Int64),
        ('col_140', pl.Int64),
        ('col_141', pl.Float64),
        ('col_142', pl.Float64),
        ('col_143', pl.Int64),
        ('col_144', pl.Int64),
        ('col_145', pl.Int64),
        ('col_146', pl.Int64),
        ('col_147', pl.Int64),
        ('col_148', pl.Int64),
        ('col_149', pl.Float64),
        ('col_150', pl.Float64),
        ('col_151', pl.Int64),
        ('col_152', pl.Float64),
        ('col_153', pl.Int64),
        ('col_154', pl.Int64),
        ('col_155', pl.Int64),
        ('col_156', pl.Float64),
        ('col_157', pl.Float64),
        ('col_158', pl.Float64),
        ('col_159', pl.Int64),
        ('col_160', pl.Int64),
        ('col_161', pl.Int64),
        ('col_162', pl.Int64),
        ('col_163', pl.Float64),
        ('col_164', pl.Float64),
        ('col_165', pl.Int64),
        ('col_166', pl.Int64),
        ('col_167', pl.Int64),
        ('col_168', pl.Float64),
        ('col_169', pl.Float64),
        ('col_170', pl.Int64),
        ('col_171', pl.Float64),
        ('col_172', pl.Int64),
        ('col_173', pl.Int64),
        ('col_174', pl.Float64),
        ('col_175', pl.Int64),
        ('col_176', pl.Float64),
        ('col_177', pl.Int64),
        ('col_178', pl.Int64),
        ('col_179', pl.Int64),
        ('col_180', pl.Int64),
        ('col_181', pl.Int64),
        ('col_182', pl.Int64),
        ('col_183', pl.Int64),
        ('col_184', pl.Float64),
        ('col_185', pl.Int64),
        ('col_186', pl.Int64),
        ('col_187', pl.Int64),
        ('col_188', pl.Int64),
        ('col_189', pl.Int64),
        ('col_190', pl.Int64),
        ('col_191', pl.Int64),
        ('col_192', pl.Int64),
        ('col_193', pl.Int64),
        ('col_194', pl.Int64),
        ('col_195', pl.Float64),
        ('col_196', pl.Int64),
        ('col_197', pl.Int64),
        ('col_198', pl.Int64),
        ('col_199', pl.Int64),
        ('col_200', pl.Float64),
        ('col_201', pl.Int64),
        ('col_202', pl.Int64),
        ('col_203', pl.Int64),
        ('col_204', pl.Int64),
        ('col_205', pl.Int64),
        ('col_206', pl.Float64),
        ('col_207', pl.Int64),
        ('col_208', pl.Float64),
        ('col_209', pl.Int64),
        ('col_210', pl.Int64),
        ('col_211', pl.Int64),
        ('col_212', pl.Int64),
        ('col_213', pl.Int64),
        ('col_214', pl.Int64),
        ('col_215', pl.Int64),
        ('col_216', pl.Int64),
        ('col_217', pl.Int64),
        ('col_218', pl.Float64),
        ('col_219', pl.Float64),
        ('col_220', pl.Float64),
        ('col_221', pl.Float64),
        ('col_222', pl.Int64),
        ('col_223', pl.Float64),
        ('col_224', pl.Int64),
        ('col_225', pl.Int64),
        ('col_226', pl.Int64),
        ('col_227', pl.Int64),
        ('col_228', pl.Int64),
        ('col_229', pl.Int64),
        ('col_230', pl.Float64),
        ('col_231', pl.Float64),
        ('col_232', pl.Int64),
        ('col_233', pl.Int64),
        ('col_234', pl.Int64),
        ('col_235', pl.Int64),
        ('col_236', pl.Int64),
        ('col_237', pl.Int64),
        ('col_238', pl.Int64),
        ('col_239', pl.Int64),
        ('col_240', pl.Int64),
        ('col_241', pl.Int64),
        ('col_242', pl.Int64),
        ('col_243', pl.Int64),
        ('col_244', pl.Int64),
        ('col_245', pl.Int64),
        ('col_246', pl.Int64),
        ('col_247', pl.Float64),
        ('col_248', pl.Int64),
        ('col_249', pl.Int64),
        ('col_250', pl.Int64),
        ('col_251', pl.Float64),
        ('col_252', pl.Float64),
        ('col_253', pl.Int64),
        ('col_254', pl.Float64),
        ('col_255', pl.Int64),
        ('col_256', pl.Int64),
        ('col_257', pl.Int64),
        ('col_258', pl.Int64),
        ('col_259', pl.Float64),
        ('col_260', pl.Int64),
        ('col_261', pl.Int64),
        ('col_262', pl.Int64),
        ('col_263', pl.Int64),
        ('col_264', pl.Int64),
        ('col_265', pl.Float64),
        ('col_266', pl.Int64),
        ('col_267', pl.Int64),
        ('col_268', pl.Int64),
        ('col_269', pl.Int64),
        ('col_270', pl.Int64),
        ('col_271', pl.Int64),
        ('col_272', pl.Float64),
        ('col_273', pl.Float64),
        ('col_274', pl.Int64),
        ('col_275', pl.Int64),
        ('col_276', pl.Float64),
        ('col_277', pl.Float64),
        ('col_278', pl.Float64),
        ('col_279', pl.Int64),
        ('col_280', pl.Float64),
        ('col_281', pl.Float64),
        ('col_282', pl.Int64),
        ('col_283', pl.Float64),
        ('col_284', pl.Int64),
        ('col_285', pl.Int64),
        ('col_286', pl.Float64),
        ('col_287', pl.Float64),
        ('col_288', pl.Float64),
        ('col_289', pl.Float64),
        ('col_290', pl.Int64),
        ('col_291', pl.Float64),
        ('col_292', pl.Float64),
        ('col_293', pl.Float64),
        ('col_294', pl.Float64),
        ('col_295', pl.Float64),
        ('col_296', pl.Float64),
        ('col_297', pl.Int64),
        ('col_298', pl.Int64),
        ('col_299', pl.Float64),
        ('col_300', pl.Float64),
        ('col_301', pl.Float64),
        ('col_302', pl.Float64),
        ('col_303', pl.Float64),
        ('col_304', pl.Float64),
        ('col_305', pl.Float64),
        ('col_306', pl.Float64),
        ('col_307', pl.Int64),
        ('col_308', pl.Int64),
        ('col_309', pl.Float64),
        ('col_310', pl.Float64),
        ('col_311', pl.Int64),
        ('col_312', pl.Int64),
        ('col_313', pl.Int64),
        ('col_314', pl.Int64),
        ('col_315', pl.Int64),
        ('col_316', pl.Int64),
        ('col_317', pl.Int64),
        ('col_318', pl.Float64),
        ('col_319', pl.Int64),
        ('col_320', pl.Int64),
        ('col_321', pl.Float64),
        ('col_322', pl.Int64),
        ('col_323', pl.Int64),
        ('col_324', pl.Int64),
        ('col_325', pl.Float64),
        ('col_326', pl.Int64),
        ('col_327', pl.Int64),
        ('col_328', pl.Int64),
        ('col_329', pl.Int64),
        ('col_330', pl.Int64),
        ('col_331', pl.Int64),
        ('col_332', pl.Float64),
        ('col_333', pl.Float64),
        ('col_334', pl.Int64),
        ('col_335', pl.Float64),
        ('col_336', pl.Int64),
        ('col_337', pl.Int64),
        ('col_338', pl.Int64),
        ('col_339', pl.Int64),
        ('col_340', pl.Int64),
        ('col_341', pl.Float64),
        ('col_342', pl.Int64),
        ('col_343', pl.Int64),
        ('col_344', pl.Float64),
        ('col_345', pl.Float64),
        ('col_346', pl.Float64),
        ('col_347', pl.Int64),
        ('col_348', pl.Int64),
        ('col_349', pl.Int64),
        ('col_350', pl.Int64),
        ('col_351', pl.Float64),
        ('col_352', pl.Int64),
        ('col_353', pl.Float64),
        ('col_354', pl.Int64),
        ('col_355', pl.Int64),
        ('col_356', pl.Int64),
        ('col_357', pl.Int64),
        ('col_358', pl.Int64),
        ('col_359', pl.Int64),
        ('col_360', pl.Int64),
        ('col_361', pl.Int64),
        ('col_362', pl.Int64),
        ('col_363', pl.Int64),
        ('col_364', pl.Int64),
        ('col_365', pl.Int64),
        ('col_366', pl.Int64),
        ('col_367', pl.Int64),
        ('col_368', pl.Float64),
        ('col_369', pl.Float64),
        ('col_370', pl.Float64),
        ('col_371', pl.Int64),
        ('col_372', pl.Float64),
        ('col_373', pl.Int64),
        ('col_374', pl.Float64),
        ('col_375', pl.Float64),
        ('col_376', pl.Int64),
        ('col_377', pl.Int64),
        ('col_378', pl.Int64),
        ('col_379', pl.Int64),
        ('col_380', pl.Int64),
        ('col_381', pl.Float64),
        ('col_382', pl.Float64),
        ('col_383', pl.Int64),
        ('col_384', pl.Int64),
        ('col_385', pl.Int64),
        ('col_386', pl.Float64),
        ('col_387', pl.Int64),
        ('col_388', pl.Int64),
        ('col_389', pl.Int64),
        ('col_390', pl.Int64),
        ('col_391', pl.Int64),
        ('col_392', pl.Float64),
        ('col_393', pl.Int64),
        ('col_394', pl.Int64),
        ('col_395', pl.Int64),
        ('col_396', pl.Int64),
        ('col_397', pl.Int64),
        ('col_398', pl.Int64),
        ('col_399', pl.Int64),
        ('col_400', pl.Int64),
        ('col_401', pl.Int64),
        ('col_402', pl.Int64),
        ('col_403', pl.Int64),
        ('col_404', pl.Int64),
        ('col_405', pl.Float64),
        ('col_406', pl.Int64),
        ('col_407', pl.Float64),
        ('col_408', pl.Float64),
        ('col_409', pl.Float64),
        ('col_410', pl.Float64),
        ('col_411', pl.Int64),
        ('col_412', pl.Float64),
        ('col_413', pl.Int64),
        ('col_414', pl.Int64),
        ('col_415', pl.Float64),
        ('col_416', pl.Int64),
        ('col_417', pl.Float64),
        ('col_418', pl.Float64),
        ('col_419', pl.Int64),
        ('col_420', pl.Int64),
        ('col_421', pl.Int64),
        ('col_422', pl.Int64),
        ('col_423', pl.Float64),
        ('col_424', pl.Float64),
        ('col_425', pl.Int64),
        ('col_426', pl.Float64),
        ('col_427', pl.Int64),
        ('col_428', pl.Int64),
        ('col_429', pl.Int64),
        ('col_430', pl.Int64),
        ('col_431', pl.Int64),
        ('col_432', pl.Float64),
        ('col_433', pl.Int64),
        ('col_434', pl.Int64),
        ('col_435', pl.Int64),
        ('col_436', pl.Int64),
        ('col_437', pl.Int64),
        ('col_438', pl.Int64),
        ('col_439', pl.Int64),
        ('col_440', pl.Int64),
        ('col_441', pl.Float64),
        ('col_442', pl.Float64),
        ('col_443', pl.Int64),
        ('col_444', pl.Float64),
        ('col_445', pl.Int64),
        ('col_446', pl.Int64),
        ('col_447', pl.Int64),
        ('col_448', pl.Float64),
        ('col_449', pl.Float64),
        ('col_450', pl.Float64),
        ('col_451', pl.Float64),
        ('col_452', pl.Float64),
        ('col_453', pl.Int64),
        ('col_454', pl.Float64),
        ('col_455', pl.Float64),
        ('col_456', pl.Int64),
        ('col_457', pl.Float64),
        ('col_458', pl.Int64),
        ('col_459', pl.Int64),
        ('col_460', pl.Int64),
        ('col_461', pl.Int64),
        ('col_462', pl.Int64),
        ('col_463', pl.Float64),
        ('col_464', pl.Float64),
        ('col_465', pl.Float64),
        ('col_466', pl.Int64),
        ('col_467', pl.Int64),
        ('col_468', pl.Float64),
        ('col_469', pl.Int64),
        ('col_470', pl.Int64),
        ('col_471', pl.Float64),
        ('col_472', pl.Int64),
        ('col_473', pl.Float64),
        ('col_474', pl.Float64),
        ('col_475', pl.Float64),
        ('col_476', pl.Float64),
        ('col_477', pl.Float64),
        ('col_478', pl.Int64),
        ('col_479', pl.Int64),
        ('col_480', pl.Int64),
        ('col_481', pl.Int64),
        ('col_482', pl.Int64),
        ('col_483', pl.Int64),
        ('col_484', pl.Float64),
        ('col_485', pl.Int64),
        ('col_486', pl.Int64),
        ('col_487', pl.Float64),
        ('col_488', pl.Int64),
        ('col_489', pl.Int64),
        ('col_490', pl.Float64),
        ('col_491', pl.Float64),
        ('col_492', pl.Int64),
        ('col_493', pl.Float64),
        ('col_494', pl.Int64),
        ('col_495', pl.Int64),
        ('col_496', pl.Float64),
        ('col_497', pl.Int64),
        ('col_498', pl.Int64),
        ('col_499', pl.Int64),
        ('col_500', pl.Int64),
        ('col_501', pl.Int64),
        ('col_502', pl.Int64),
        ('col_503', pl.Int64),
        ('col_504', pl.Int64),
        ('col_505', pl.Int64),
        ('col_506', pl.Int64),
        ('col_507', pl.Int64),
        ('col_508', pl.Int64),
        ('col_509', pl.Int64),
        ('col_510', pl.Int64),
        ('col_511', pl.Int64),
        ('col_512', pl.Int64),
        ('col_513', pl.Int64),
        ('col_514', pl.Int64),
        ('col_515', pl.Int64),
        ('col_516', pl.Int64),
        ('col_517', pl.Int64),
        ('col_518', pl.Int64),
        ('col_519', pl.Int64),
        ('col_520', pl.Int64),
        ('col_521', pl.Int64),
        ('col_522', pl.Int64),
        ('col_523', pl.Int64),
        ('col_524', pl.Int64),
        ('col_525', pl.Int64),
        ('col_526', pl.Int64),
        ('col_527', pl.Int64),
        ('col_528', pl.Int64),
        ('col_529', pl.Int64),
        ('col_530', pl.Int64),
        ('col_531', pl.Int64),
        ('col_532', pl.Int64),
        ('col_533', pl.Float64),
        ('col_534', pl.Int64),
        ('col_535', pl.Float64),
        ('col_536', pl.Int64),
        ('col_537', pl.Float64),
        ('col_538', pl.Int64),
        ('col_539', pl.Float64),
        ('col_540', pl.Int64),
        ('col_541', pl.Int64),
        ('col_542', pl.Int64),
        ('col_543', pl.Int64),
        ('col_544', pl.Float64),
        ('col_545', pl.Int64),
        ('col_546', pl.Float64),
        ('col_547', pl.Float64),
        ('col_548', pl.Int64),
        ('col_549', pl.Float64),
        ('col_550', pl.Int64),
        ('col_551', pl.Int64),
        ('col_552', pl.Int64),
        ('col_553', pl.Float64),
        ('col_554', pl.Float64),
        ('col_555', pl.Int64),
        ('col_556', pl.Int64),
        ('col_557', pl.Int64),
        ('col_558', pl.Int64),
        ('col_559', pl.Float64),
        ('col_560', pl.Float64),
        ('col_561', pl.Float64),
        ('col_562', pl.Float64),
        ('col_563', pl.Int64),
        ('col_564', pl.Int64),
        ('col_565', pl.Int64),
        ('col_566', pl.Int64),
        ('col_567', pl.Int64),
        ('col_568', pl.Int64),
        ('col_569', pl.Int64),
        ('col_570', pl.Int64),
        ('col_571', pl.Int64),
        ('col_572', pl.Int64),
        ('col_573', pl.Float64),
        ('col_574', pl.Int64),
        ('col_575', pl.Int64),
        ('col_576', pl.Float64),
        ('col_577', pl.Int64),
        ('col_578', pl.Int64),
        ('col_579', pl.Int64),
        ('col_580', pl.Int64),
        ('col_581', pl.Float64),
        ('col_582', pl.Int64),
        ('col_583', pl.Float64),
        ('col_584', pl.Int64),
        ('col_585', pl.Int64),
        ('col_586', pl.Int64),
        ('col_587', pl.Float64),
        ('col_588', pl.Int64),
        ('col_589', pl.Float64),
        ('col_590', pl.Float64),
        ('col_591', pl.Int64),
        ('col_592', pl.Float64),
        ('col_593', pl.Float64),
        ('col_594', pl.Float64),
        ('col_595', pl.Int64),
        ('col_596', pl.Int64),
        ('col_597', pl.Int64),
        ('col_598', pl.Int64),
        ('col_599', pl.Float64),
        ('col_600', pl.Int64),
        ('col_601', pl.Int64),
        ('col_602', pl.Int64),
        ('col_603', pl.Int64),
        ('col_604', pl.Int64),
        ('col_605', pl.Float64),
        ('col_606', pl.Int64),
        ('col_607', pl.Float64),
        ('col_608', pl.Int64),
        ('col_609', pl.Int64),
        ('col_610', pl.Int64),
        ('col_611', pl.Float64),
        ('col_612', pl.Int64),
        ('col_613', pl.Float64),
        ('col_614', pl.Int64),
        ('col_615', pl.Int64),
        ('col_616', pl.Int64),
        ('col_617', pl.Float64),
        ('col_618', pl.Int64),
        ('col_619', pl.Float64),
        ('col_620', pl.Int64),
        ('col_621', pl.Int64),
        ('col_622', pl.Int64),
        ('col_623', pl.Int64),
        ('col_624', pl.Float64),
        ('col_625', pl.Int64),
        ('col_626', pl.Float64),
        ('col_627', pl.Int64),
        ('col_628', pl.Float64),
        ('col_629', pl.Int64),
        ('col_630', pl.Int64),
        ('col_631', pl.Int64),
        ('col_632', pl.Int64),
        ('col_633', pl.Int64),
        ('col_634', pl.Float64),
        ('col_635', pl.Int64),
        ('col_636', pl.Int64),
        ('col_637', pl.Float64),
        ('col_638', pl.Int64),
        ('col_639', pl.Int64),
        ('col_640', pl.Int64),
        ('col_641', pl.Int64),
        ('col_642', pl.Int64),
        ('col_643', pl.Int64),
        ('col_644', pl.Int64),
        ('col_645', pl.Int64),
        ('col_646', pl.Int64),
        ('col_647', pl.Float64),
        ('col_648', pl.Int64),
        ('col_649', pl.Int64),
        ('col_650', pl.Int64),
        ('col_651', pl.Int64),
        ('col_652', pl.Int64),
        ('col_653', pl.Int64),
        ('col_654', pl.Int64),
        ('col_655', pl.Int64),
        ('col_656', pl.Int64),
        ('col_657', pl.Int64),
        ('col_658', pl.Int64),
        ('col_659', pl.Int64),
        ('col_660', pl.Int64),
        ('col_661', pl.Float64),
        ('col_662', pl.Int64),
        ('col_663', pl.Int64),
        ('col_664', pl.Int64),
        ('col_665', pl.Float64),
        ('col_666', pl.Int64),
        ('col_667', pl.Int64),
        ('col_668', pl.Float64),
        ('col_669', pl.Int64),
        ('col_670', pl.Float64),
        ('col_671', pl.Int64),
        ('col_672', pl.Int64),
        ('col_673', pl.Int64),
        ('col_674', pl.Int64),
        ('col_675', pl.Int64),
        ('col_676', pl.Int64),
        ('col_677', pl.Int64),
        ('col_678', pl.Int64),
        ('col_679', pl.Int64),
        ('col_680', pl.Float64),
        ('col_681', pl.Float64),
        ('col_682', pl.Float64),
        ('col_683', pl.Float64),
        ('col_684', pl.Float64),
        ('col_685', pl.Float64),
        ('col_686', pl.Float64),
        ('col_687', pl.Float64),
        ('col_688', pl.Int64),
        ('col_689', pl.Float64),
        ('col_690', pl.Float64),
        ('col_691', pl.Int64),
        ('col_692', pl.Int64),
        ('col_693', pl.Int64),
        ('col_694', pl.Int64),
        ('col_695', pl.Int64),
        ('col_696', pl.Int64),
        ('col_697', pl.Int64),
        ('col_698', pl.Int64),
        ('col_699', pl.Int64),
        ('col_700', pl.Int64),
        ('col_701', pl.Int64),
        ('col_702', pl.Int64),
        ('col_703', pl.Int64),
        ('col_704', pl.Int64),
        ('col_705', pl.Int64),
        ('col_706', pl.Int64),
        ('col_707', pl.Int64),
        ('col_708', pl.Int64),
        ('col_709', pl.Int64),
        ('col_710', pl.Int64),
        ('col_711', pl.Int64),
        ('col_712', pl.Int64),
        ('col_713', pl.Int64),
        ('col_714', pl.Int64),
        ('col_715', pl.Int64),
        ('col_716', pl.Int64),
        ('col_717', pl.Int64),
        ('col_718', pl.Int64),
        ('col_719', pl.Int64),
        ('col_720', pl.Int64),
        ('col_721', pl.Int64),
        ('col_722', pl.Int64),
        ('col_723', pl.Int64),
        ('col_724', pl.Int64),
        ('col_725', pl.Float64),
        ('col_726', pl.Float64),
        ('col_727', pl.Float64),
        ('col_728', pl.Float64),
        ('col_729', pl.Float64),
        ('col_730', pl.Float64),
        ('col_731', pl.Float64),
        ('col_732', pl.Float64),
        ('col_733', pl.Float64),
        ('col_734', pl.Float64),
        ('col_735', pl.Float64),
        ('col_736', pl.Float64),
        ('col_737', pl.Float64),
        ('col_738', pl.Float64),
        ('col_739', pl.Float64),
        ('col_740', pl.Float64),
        ('col_741', pl.Float64),
        ('col_742', pl.Float64),
        ('col_743', pl.Float64),
        ('col_744', pl.Float64),
        ('col_745', pl.Float64),
        ('col_746', pl.Float64),
        ('col_747', pl.Float64),
        ('col_748', pl.Float64),
        ('col_749', pl.Float64),
        ('col_750', pl.Float64),
        ('col_751', pl.Boolean),
        ('col_752', pl.Float64),
        ('col_753', pl.Int32),
        ('col_754', pl.Boolean),
        ('col_755', pl.String),
        ('col_756', pl.Boolean),
        ('col_757', pl.Boolean),
        ('col_758', pl.Boolean),
        ('col_759', pl.Boolean),
        ('col_760', pl.Boolean),
        ('col_761', pl.Boolean),
        ('col_762', pl.Boolean),
        ('col_763', pl.Boolean),
        ('col_764', pl.Boolean),
        ('col_765', pl.Boolean),
        ('col_766', pl.String),
        ('col_767', pl.String),
        ('col_768', pl.Boolean),
        ('col_769', pl.Boolean),
        ('col_770', pl.Boolean),
        ('col_771', pl.Boolean),
        ('col_772', pl.Boolean)])

string_col = 2_915_268 * ["hi"]
int_col = 971_756 * [0, 0, 1]
float_col = 1_457_634 * [1.0, 0.0]
bool_col = 1_457_634 * [True, False]

data = {}
for col, dtype in schema.items():
    if dtype == pl.String:
        data[col] = string_col
    elif dtype in (pl.Float32, pl.Float64):
        data[col] = float_col
    elif dtype in (pl.Int32, pl.Int64):
        data[col] = int_col
    elif dtype == pl.Boolean:
        data[col] = bool_col

data = pl.DataFrame(data, schema=schema)

res = data.filter(data["col_727"] == 1)
res = data.filter(pl.col("col_727") == 1)

Log output

No response

Issue description

Using data.filter(data["col_727"] == 1) to filter this column executes in 0.5s. However, filtering with data.filter(pl.col("col_727") == 1) executes in 13.6s. The column is a float64 column with values 0.0 and 1.0 and no null values. The ratio of 0.0 to 1.0 is 2:1 as in the fake data I've shared. What might be causing this sizable discrepancy? Could it be that polars is not properly distributing compute across cpus with pl.col?

Expected behavior

Execution times are the same or very similar.

Installed versions

``` --------Version info--------- Polars: 1.7.1 Index type: UInt32 Platform: Linux-5.10.223-212.873.amzn2.x86_64-x86_64-with-glibc2.35 Python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] ----Optional dependencies---- adbc_driver_manager altair 5.3.0 cloudpickle 2.2.1 connectorx deltalake fastexcel fsspec 2023.6.0 gevent great_tables matplotlib 3.8.4 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl pandas 2.1.4 pyarrow 15.0.0 pydantic 2.6.4 pyiceberg sqlalchemy 2.0.30 torch 2.0.0.post104 xlsx2csv xlsxwriter ```
coastalwhite commented 2 weeks ago

I am unable to reproduce the discrepancy between the two queries.

ritchie46 commented 1 week ago

Can you share your problem with original data?

AltamashRafiq commented 1 week ago

I cannot provide the original data as it is customer confidential. I'll investigate the issue more on Monday and hopefully can recreate with a reproducible example. Sorry for not having one at the time of this post :(