mediacloud / backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
http://www.mediacloud.org
GNU Affero General Public License v3.0
281 stars 87 forks source link

port spidering engine code to Python #681

Open rahulbot opened 4 years ago

rahulbot commented 4 years ago

One of the last big chunks of code that need porting to Python from Perl. (Split off of #679)

pypt commented 4 years ago

By spidering you refer to topics-* apps?

rahulbot commented 4 years ago

@hroberts - where is the code for the spidering engine?

hroberts commented 4 years ago

https://github.com/berkmancenter/mediacloud/blob/master/apps/topics-mine/src/perl/MediaWords/TM/Mine.pm