New README file instructions are incorrect

I am following along the instructions in the new README.md and they don't work as expected.

Note: I have installed ARES using the instructions at https://ares-ai.vercel.app/installation/ ,given that the Python version has not been bumped to any new release. The previous codebase was 0.2.3, current version in PyPi is still 0.2.3.

In the Quick Start 1 tutorial, this wget commands point to datasets that were deleted during the last update. So:

wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets/nq_few_shot_prompt_v1.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_labeled_output.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv

returns error 404 for all files:

--2024-04-23 00:39:10--  https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-23 00:39:10 ERROR 404: Not Found.

When executing the ues_idp block for the first time a ModuleNotFound is returned. vLLM package is missing, so it has to be installed manually.

In step 2, synthetic dataset generation, document_filepath is expected to be a list, a str is passed. synthetic_queries_filename parameter is incorrect. Correct name is synthetic_queries_filenames and it's of type list, not str.

File nq_few_shot_prompt_for_synthetic_query_generation.tsv under examples only has Query and Document as columns. Running the synthetic dataset generation code returns: KeyError: 'Context_Relevance_Label' as that column is missing from the file.

In Step 3, the route to training dataset should be data/output/synthetic_queries_1.tsv as in the previous code block. Data is missing at the beginning of the path.

Parameter 'training_dataset' for classifier_model is expected to be of type list, received str instead. Parameter 'validation_set' for classifier_model is expected to be of type list, received str instead. Parameter 'label_column' for classifier_model is expected to be of type list, received str instead.

There might be more errors once I am able to run the code, but I've not been able to generate the synthetic dataset using flan because of the incorrect few_shot_file

stanford-futuredata / ARES

New README file instructions are incorrect #27