Closed durgeshiitj closed 4 years ago
Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with PandasParallelLFApplier
. I've opened up #1589 but in the meantime, you can just use the standard PandasLFApplier
or sort your index before using PandasParallelLFApplier
so that the order of the rows of L
is expected.
Hi @durgeshiitj, apologies for the delayed response here! This is likely due to using an unsorted index with
PandasParallelLFApplier
. I've opened up #1589 but in the meantime, you can just use the standardPandasLFApplier
or sort your index before usingPandasParallelLFApplier
so that the order of the rows ofL
is expected.
Hi Henry, Thanks for following up. However, I tried debugging at my end as well. I found out that the system where Snorkel 0.9.5 is installed, the Dask version was 2.14.2 and where 0.9.3 was installed the Dask version was 2.5.2. So I tried downgrading Dask to 2.5.2 to run with Snorkel 0.9.5 and to my surprise there the PandasParallelLFApplier worked normally. So I please check that as well, because in requirement Dask version mentioned is <3 so 2.14 should not have caused any issue as well.
Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!
Hi @durgeshiitj, thanks for reporting and we'll look into version compatibility on our side!
I didn't get any update on the issue
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Issue description
I ran snorkel(v 0.9.5) on a dataset using PandasParrallelLFApplier and to my surprise I got 10% accuracy which I was expecting to be 90%. Then tried to use PandasLFApplier just to cross verify and I got 90% accuracy. When I compared the LabelMatrixs, both were not eqauls.
Before I was using 0.9.3 never faced problem. Just to cross verify I ran the same dataset on a different sytem having version 0.9.3 with both PandasParallelLFApplier and PandasLFApplier and found that in 0.9.3, both are yielding same Label-Matrix and same accuracy with same LFAnalysis.
Expected behavior
Both LFAppliers should yield similar results.
Screenshots
I'm attaching screenshots for your reference.
V 0.9.5 Analysis:
PandasLFApplier:
PandasParallelLFApplier:
Label-Matrix Comparison:
V 0.9.3 Analysis:
PandasLFApplier:
PandasParallelLFApplier:
Label-Matrix Comparison:
System info
Additional context
Please look into this asap.