tanussingh / Big-Data-Management-Analytics-Project

Final Project for CS 6350.001 - Large Scale Data Collection and preprocessing in Spark
3 stars 2 forks source link

Find a way to compare articles based on UDPipe output #3

Open ishansharma opened 5 years ago

ishansharma commented 5 years ago

Once we have output of UDPipe in Spark DataFrame, how can 2 articles be compared for similarity? On what level (sentence or article)?

ishansharma commented 5 years ago

UDPipe Demo for trying on different articles and sentences.

ishansharma commented 5 years ago

@mavisfrancia Just brainstorming: Not sure if UDPipe can provide output in tree structure but that should be easier for comparisons? Otherwise, out best bet is to parse the table structure using regular expressions.