yusanshi / news-recommendation

Implementations of some methods in news recommendation.
MIT License
241 stars 50 forks source link

Data processing systemexit #18

Closed martin6336 closed 3 years ago

martin6336 commented 3 years ago

When I run python3 src/data_preprocess.py, there seem some problems with threading? The error report does not give the problem location in the code.

Dask Apply: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 192/192 [04:02<00:00,  1.26s/it]
Exception ignored in: <module 'threading' from '/home/biqiwei/anaconda3/envs/py36/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/home/biqiwei/anaconda3/envs/py36/lib/python3.6/threading.py", line 1292, in _shutdown
    t = _pickSomeNonDaemonThread()
  File "/home/biqiwei/anaconda3/envs/py36/lib/python3.6/threading.py", line 1298, in _pickSomeNonDaemonThread
    for t in enumerate():
  File "/home/biqiwei/anaconda3/envs/py36/lib/python3.6/threading.py", line 1269, in enumerate
    return list(_active.values()) + list(_limbo.values())
  File "/home/biqiwei/anaconda3/envs/py36/lib/python3.6/site-packages/ray/worker.py", line 1006, in sigterm_handler
    sys.exit(signum)
SystemExit: 15

Please modify `num_categories` in `src/config.py` into 1 + 295
Please modify `num_words` in `src/config.py` into 1 + 101225
Please modify `num_entities` in `src/config.py` into 1 + 21842
Generate word embedding
Rate of word missed in pretrained embedding: 0.2958
Transform entity embeddings
yusanshi commented 3 years ago

Looks like the problem was caused by news.swifter.apply? (you can add a print before and after it to verify this.)

If that is the case, stop using swifter by replacing news.swifter.apply with news.apply to avoid the multithreading issue caused by it. The cost is you will lose a cute progress bar and possibly get performance degradation.

martin6336 commented 3 years ago

Thanks, it works!

By the way, the performance degradation you mentioned is referred to the speed of data processing or the final score of models?

yusanshi commented 3 years ago

I mean the speed of data processing. :smiley: