Start doing research survey on LLMs and datasets

sonaalKant commented 7 months ago

The task involves leveraging a Large Language Model (LLM) for sentiment analysis on financial news. The primary objective is to:

Find a Suitable Dataset:

Locate and acquire a publicly available dataset containing financial news articles (see if there are some on crypto). Create a spreadsheet with columns [ dataset name, dataset source, size of dataset, Cost(Free/paid), label, any benchmark available or not, Contains crypto news ]

Find Suitable LLMs:

Start with creating List of LLMs which are currently finetuned on financial data/news. Create a spreadsheet with columns [Model Name, Model download link, Model trained/finetuned on datasets, evaluated on benchmark]

For reference on dataset and publicly available models you can refer https://paperswithcode.com/ , https://huggingface.co/models and https://huggingface.co/datasets.

FYI @gpsaggese @samarth9008 @jsmerix

tkpratardan commented 7 months ago

@sonaalKant What is the expected timeline?

sonaalKant commented 7 months ago

In the models sheet, Evaluated on benchmark meant what other test set they have evaluated on besides the test split.

tkpratardan commented 7 months ago

@sonaalKant Could you please review the docs here?

Also, could you please provide your feedback on the following:

Am I headed in the right direction?
Any improvements and feedback on the work?

sonaalKant commented 7 months ago

This looks good. I will close this issue as it has covered preliminary research in models and datasets. I will file another issue about next steps.

gpsaggese commented 7 months ago

Good stuff.

sonaalKant commented 4 months ago

This is the first step towards PEOE. I think we have some preliminary findings /survey here https://drive.google.com/drive/u/3/folders/1FKEkRZwshZhj_hSB-n0HaOkeneBiFnOY

kaizen-ai / kaizenflow

Start doing research survey on LLMs and datasets #750