pesos / PES-SoC-14

Information repository for PES Summer of Code 2014
2 stars 2 forks source link

World News Reporter #6

Closed vidhishanair closed 10 years ago

vidhishanair commented 10 years ago

What is World News Reporter?

This project aims to be a trend analyser analyzer for content being uploaded dynamically every instant across various online forums. The bot scrapes through content of many social media and forums like Facebook, Twitter, Mashable etc to generate summaries on topics that are being currently posted on.

Brief Description

The project aims to take forward the trend analyser done by @vidhishanair , @shruthichari , @shrinidhir and Swathi Veeradhi on Twitter data. We have implemented a basic summarizer and classifier on Twitter trending HashTags. The project will now move to the next level where data is pulled of multiple sites, analysed using mining algorithms and culminated into reports on current trends around the world. The summaries are then classified into different categories based on a multi-level hierarchical model. Extra features like similarity checking for related reports are add-ons to the project. The summaries generated should be semantically correct for a good report. The classification could be extended to a multi level model and our algorithm could also intelligently choose its test set based on relevance. We would like to build further on the current tool and make a powerful analyser. Basic version using twitter data can be found at: https://github.com/vidhishanair/TwiBot This model uses lexrank to produce extractive summaries and Naive Bayes to classify trending hash tags. It is under development.

Goals

  1. Rework of the existing codebase to add more features.
  2. Extend support across various online forums
  3. Better semantic summarization.
  4. Multi level classification.
  5. UI Interface support.

    Requirements

  6. You should know Python.
  7. Knowledge of MVC model needed (preferably Flask or Django).
  8. A good understanding or enthusiasm to understand of Data Mining techniques preferred.
  9. Web Technologies for UI support (optional)
sandeepraju commented 10 years ago

This sounds interesting! :smiley: where does the current code live?

shrinidhir commented 10 years ago

Thank you Sandeep! We're fine tuning it a little bit, it'll be up in github in a couple of days :) On May 2, 2014 8:52 PM, "Sandeep Raju" notifications@github.com wrote:

This sounds interesting! [image: :smiley:] where does the current code live?

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-42043368 .

suhastech commented 10 years ago

Might be worth a few million quids :)

http://en.wikipedia.org/wiki/Nick_D'Aloisio

gururajbn commented 10 years ago

This is interesting project, I want to know more about this, How can i contact mentors? IRC? Email?

charishruthi commented 10 years ago

@gururajbn Vidhisha Nair and I are mentoring this project. You can contact us by email anytime. Drop in an email to either: vidhishanair@gmail.com aragon.shruthi@gmail.com

vidhishanair commented 10 years ago

@sandeepraju Hey thanks, working with some issues with putting up.. will post link soon. sorry! @suhastech Oh summly! We saw through this when we started off.... @gururajbn Great, would love to hear from you! :)

abhi12ravi commented 10 years ago

@vidhishanair Is the code up? :) Looking forward to explore.

vidhishanair commented 10 years ago

@abhi12ravi Really sorry, but there were some problems with syncing with git. We're trying to fix the issue and will put up the link as soon as we can. Sorry about that.

On Wed, May 14, 2014 at 5:53 PM, Abhiram Ravikumar <notifications@github.com

wrote:

@vidhishanair https://github.com/vidhishanair Is the code up? :) Looking forward to explore.

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43073393 .

sandeepraju commented 10 years ago

@vidhishanair what is the git issue? should be something simple.. we can help if you are not able to figure out.. :smiley:

vidhishanair commented 10 years ago

@sandeepraju just fixed the issue. Thank you. Had some problems with sub-modules and origins. I have updated the link on the report. @abhi12ravi @gururajbn Please check the link on the report.

sandeepraju commented 10 years ago

@vidhishanair @ShruthiChari @shrinidhir was going through the source code.. you seem to have mistakenly uploaded your twitter secrets. Revoke your secrets on you twitter dev dashboard to avoid abuse.. :neutral_face:

shrinidhir commented 10 years ago

@sandeepraju : The keys? Yes, we'll revoke them, thanks a lot :)

On Wed, May 14, 2014 at 11:17 PM, Sandeep Raju notifications@github.comwrote:

@vidhishanair https://github.com/vidhishanair @ShruthiCharihttps://github.com/ShruthiChari @shrinidhir https://github.com/shrinidhir was going through the source code.. you seem to have mistakenly uploaded your twitter secrets. Revoke your secrets on you twitter dev dashboard to avoid abuse.. [image: :neutral_face:]

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43113717 .

vidhishanair commented 10 years ago

@sandeepraju Older version got committed! Thanks a lot!

On Wed, May 14, 2014 at 11:24 PM, shrinidhir notifications@github.comwrote:

@sandeepraju : The keys? Yes, we'll revoke them, thanks a lot :)

On Wed, May 14, 2014 at 11:17 PM, Sandeep Raju notifications@github.comwrote:

@vidhishanair https://github.com/vidhishanair @ShruthiChari< https://github.com/ShruthiChari> @shrinidhir https://github.com/shrinidhir was going through the source code.. you seem to have mistakenly uploaded your twitter secrets. Revoke your secrets on you twitter dev dashboard to avoid abuse.. [image: :neutral_face:]

— Reply to this email directly or view it on GitHub< https://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43113717> .

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43114541 .

sahutd commented 10 years ago

Hi,

The summaries generated should be semantically correct for a good report.

I am really interested in this part. Can you suggest further reading wrt the theory behind it?

Also,

  1. Since github does not have a private message feature, i am posting in public. The access keys and oauth tokens are still available publicly in an earlier commit(git stores history). I guess a --rebase is needed. http://stackoverflow.com/questions/1338728/delete-commits-from-a-branch-in-git
  2. There are few .pyc files. They can be removed by adding .pyc to .gitignore

Regards Saimadhav Heblikar

EDIT: as @sandeepraju mentions, its better to be on safer side and revoke the access to the keys.

sandeepraju commented 10 years ago

@sai, once it is public, even if they rebase it, people cloned can read it.. and hence it has to be assumed as compromized.. it is better to revoke the tokens itself..

On Sat, May 17, 2014 at 9:57 PM, Saimadhav Heblikar < notifications@github.com> wrote:

Hi,

The summaries generated should be semantically correct for a good report. I am really interested in this part. Can you suggest further reading wrt the theory behind it?

Also,

  1. Since github does not have a private message feature, i am posting in public. The access keys and oauth tokens are still available publicly in an earlier commit(git stores history). I guess a --rebase is needed. http://stackoverflow.com/questions/1338728/delete-commits-from-a-branch-in-git
  2. There are few .pyc files. They can be removed by adding .pyc to .gitignore

Regards Saimadhav Heblikar

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43412809 .

Sandeep Raju

vidhishanair commented 10 years ago

@sahutd i Hey! We've revoked the tokens and yeah we were a little caught up, we'll be organizing the content and automating some stuff and committing soon so that its easier for you to follow. With regard to semantic summation, there is lot of reading material you can cover. There are graph based methods and NLG based methods to do abstractive summarization. You could check the following papers: https://www.ideals.illinois.edu/bitstream/handle/2142/16949/opinosis.pdf?sequence=3 http://www.nist.gov/tac/publications/2010/participant.papers/Rali.proceedings.pdf This may require you to understand some other concepts of text similarity and classification models. You can always mail us to have a detailed discussion.

On Sat, May 17, 2014 at 10:00 PM, Sandeep Raju notifications@github.comwrote:

@sai, once it is public, even if they rebase it, people cloned can read it.. and hence it has to be assumed as compromized.. it is better to revoke the tokens itself..

On Sat, May 17, 2014 at 9:57 PM, Saimadhav Heblikar < notifications@github.com> wrote:

Hi,

The summaries generated should be semantically correct for a good report. I am really interested in this part. Can you suggest further reading wrt the theory behind it?

Also,

  1. Since github does not have a private message feature, i am posting in public. The access keys and oauth tokens are still available publicly in an earlier commit(git stores history). I guess a --rebase is needed.

http://stackoverflow.com/questions/1338728/delete-commits-from-a-branch-in-git

  1. There are few .pyc files. They can be removed by adding .pyc to .gitignore

Regards Saimadhav Heblikar

— Reply to this email directly or view it on GitHub< https://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43412809> .

Sandeep Raju

— Reply to this email directly or view it on GitHubhttps://github.com/pesos/PES-SoC-14/issues/6#issuecomment-43412983 .