thedatatribune / dyPixa

Turning words into lively shades!
https://thedatatribune.github.io/dyPixa/
MIT License
11 stars 14 forks source link

[Dataset] Poetry dataset is required #26

Open ravi-prakash1907 opened 1 year ago

ravi-prakash1907 commented 1 year ago

Description: πŸ“

This project requires an NLP model trained on a poetry dataset, encompassing different languages with a current focus on English and Hindi. The dataset should meet the following constraints:

  1. English short poems.
  2. Hindi short poems.
  3. The datasets must be of high quality, sufficiently large, and diverse.
  4. Ensure that these short poems contain figures of speech.

For longer poems, consider the following contributions:

  1. Provide scripts to preprocess the data and divide lengthy poems into shorter segments.
  2. Provide the processed dataset.

Note:

Nabanita29 commented 1 year ago

I plan to curate a diverse dataset of poems from online repositories and public domain collections. With a focus on balanced sentiment representation and accurate annotations, I will ensure the dataset's quality and integrity.

ravi-prakash1907 commented 1 year ago

That will be nice @Nabanita29. Please try to get the multilingual data and as mentioned, consider the short poems as a priority. You may join the community's discord server for further discussion, queries, and suggestions!


Note: As the milestone "Dataset Collection" is nearing its deadline, pull requests associated with this issue will be considered a priority. πŸ“