pybites / challenges

PyBites Code Challenges
https://codechalleng.es/challenges/
693 stars 2.29k forks source link

Plagarism checker (NLTK part 2?) #135

Open mridubhatnagar opened 6 years ago

mridubhatnagar commented 6 years ago

1) Converting word documents to pdf format. 2) Merging of multiple pdfs to a single pdf. 3) Plagarism checker 4) Adding a word doc, excel file or any other file into a folder and convert it into zip folder. 5) Extracting data out of pdf file. ( There was a talk in PyCon India 2017 related to this). 6) Extract data from applications. Integrate with google docs. Visualize the data.

Some good use cases needed though. Automation of daily tasks would be fun I guess.

bbelderbos commented 6 years ago

There are some food opportunities there, thanks.

As 3/4 concern docs I will rename it to office tasks.

How would you go about 3.?

mridubhatnagar commented 6 years ago

hmm... Will have to think about it. Actually while doing challenge-03 I was looking around for ways to find out similarity between words. There I came across plagiarism checker. Maybe using NTLK module something can be done. I guess percentage of similarity between 2 docs can be calculated.

pybites commented 5 years ago

Focussing challenge idea around plagarism checker as we will tackle working with PDF files for PCC60