trr266 / rescorptrans

The repository for our open online course "Research on Corporate Transparency"
MIT License
27 stars 17 forks source link

Blog topic: state-of-the-art of natural language processing: case of 10-ks #8

Open jeremiahpslewis opened 3 years ago

jeremiahpslewis commented 3 years ago

Hi all,

Is someone else interested in looking at this suggested topic? Think that at least three key aspects could be tackled here:

  1. What algorithms do ppl currently use to attack this problem, are they generic to broader NLP or more specialized?

  2. What does the rate of change look like? Where is the field likely to be in 5 years?

  3. Will this still be a relevant problem in 5 years, or will open and structured data releases cannibalize need for NLP-based approaches?

nrusha95 commented 3 years ago

Hi! I am also interested in working on a blog revolving around this topic. I indeed find the questions you mention quite interesting. I am interested in research on disclosures using textual analysis. Do you have any ideas about which data you would like to use to tackle these questions?

jeremiahpslewis commented 3 years ago

Tbh, no idea! ;) But we could start with a lit review and see what the status quo looks like?

nrusha95 commented 3 years ago

Yes right :) I agree we have to decide on what the literature suggests

Rajabalizadeh commented 3 years ago

Hello. My name is Javad. I took the Research on Corporate Transparency course, too. My Ph.D. thesis topic is related to quantifying narrative disclosures (10-ks notes, MD&A sections, etc) with textual analysis. Before this, I have published some papers in this field (comment letters and readability, readability and informational efficiency, firm fundamentals and readability), and I am familiar with the related literature. Therefore, I am interested in this topic so.

Rajabalizadeh commented 3 years ago

Hello. I prepared some notes about NLP in the accounting and finance field. After reading them, please put your comments to work more on the topic. In accounting and finance, the below contexts provide ample fodder for applying textual analysis technology:

Also, in the above contexts, previous studies have used different measures depending on the research goal. Some of them are:

  1. Word count
  2. Word cloud
  3. Word tree
  4. Word search
  5. Readability
  6. Tone/sentiment analysis
  7. Repetition
  8. Self-Inclusive Language (SIL)

Please see the below paper, too. It is very useful (online version). "Revealing Research Themes and Trends in 30 Top‐ranking Accounting Journals: A Text‐mining Approach" (https://onlinelibrary.wiley.com/doi/abs/10.1111/abac.12214)

towitter commented 3 years ago

Hi guys! From our survey, we have the following participants that show an interest in this topic (github handle in brackets, if provided):

Jeremiah (@jlewis91), Javad (@Rajabalizadeh), Jonas (@Jonas-Materna), Lazaros, Rusha (@nrusha95), Harry (@harrynnh), Mahmoud (@mdelshadi)

harrynnh commented 3 years ago

Hi all, I came across these 2 articles that can be useful for the blog post.

Harry