We're making a DB query to figure out which tags to remove on a per-source basis when parsing articles. This can cause deadlocks on some systems and warnings on others (I believe warnings on Debian and possible deadlocks on OSX). It would be better if we got the source-specific cleaning stuff from the db and passed the actual data along with the article HTML to get parsed.
A sample error is:
/home/bdc/anaconda2/lib/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>
"MongoClient opened before fork. Create MongoClient "
We're making a DB query to figure out which tags to remove on a per-source basis when parsing articles. This can cause deadlocks on some systems and warnings on others (I believe warnings on Debian and possible deadlocks on OSX). It would be better if we got the source-specific cleaning stuff from the db and passed the actual data along with the article HTML to get parsed.
A sample error is: