Data Analysis of already Existing data set

mhueppe / machineLearningProject_jaNoMi

This is a public Repository to manage the Machine Learning Project for WS 2024/25.

GNU General Public License v3.0

1 stars 0 forks source link

Data Analysis of already Existing data set #4

Closed mhueppe closed 2 weeks ago

mhueppe commented 3 weeks ago

Download and analyse the already existing dataset which inclues ~10k sample points of abstrac/title pairs from the Association of Computational Linguistics Reference to dataset: https://github.com/gcunhase/ArXivAbsTitleDataset

mhueppe commented 3 weeks ago

Commit 733c548 implements some data analysis already. Data Analysis: distributionTitles.png distributionTitles_leng.png proportionOfOverlap.png