snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

LabelModel training is occupying a lot of memory. #1656

Closed sujeethrv closed 2 years ago

sujeethrv commented 3 years ago

I am trying to build a text classification model.

I have around 1200 labelling functions and 56 unique classes. When I train the Labelmodel by calling LabelModel.fit on my laptop(8GB memory) it is throwing kernel died on Jupyter notebook. When I run the same code from the jupyter notebook in a python script(.py file) the code is not throwing any error but there is no progress. The code throws the below warning and stops there.

warning - python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

humzaiqbal commented 3 years ago

Hi sujeethrv, In general, this research implementation of the LabelModel from the 2019 AAAI paper on Snorkel was intended for relatively low cardinality problems. As an alternative (with a lower compute and memory footprint), you could try using the MajorityLabelVoter class instead of the LabelModel for this problem.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.