Closed dchichkov closed 4 years ago
@dchichkov — thanks for raising this! You're right, our current implementation is a bit slow, likely because we're supporting a more general case where transformations may be 1:many operations (as opposed to simply 1:1).
We could definitely make some improvements here — flagging this as an issue. Feel free to open a PR to make contribution yourself, as well!
Thanks! I also see very similar issue with slicing functions. A slicing function like:
@slicing_function()
def real_object(x):
"""Returns whether the object is a real object, not a reflection, shadow or depiction"""
return x.Reflection == 'false' and x.Shadow == 'false' and x.Depiction == 'false'
Takes 50 milliseconds to apply with pandas and 5 seconds (100 times slower) with snorkel.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Issue description
PandasTFApplier is slow to the point of being unusable. Time it takes to process 100k Pandas dataframe:
It also doesn't allow adding new fields...
Code example/repro steps
Expected behavior
Process 100k elements in 1 second, not in 56 seconds (10x slower than directly with Pandas, not 560x slower).
Screenshots
System info
Additional context