stanford-futuredata / ARES

https://ares-ai.vercel.app/
Apache License 2.0
372 stars 41 forks source link

New README file instructions are incorrect #27

Closed elsatch closed 2 months ago

elsatch commented 2 months ago

I am following along the instructions in the new README.md and they don't work as expected.

Note: I have installed ARES using the instructions at https://ares-ai.vercel.app/installation/ ,given that the Python version has not been bumped to any new release. The previous codebase was 0.2.3, current version in PyPi is still 0.2.3.

In the Quick Start 1 tutorial, this wget commands point to datasets that were deleted during the last update. So:

wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets/nq_few_shot_prompt_v1.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_labeled_output.tsv
wget https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv

returns error 404 for all files:

--2024-04-23 00:39:10--  https://raw.githubusercontent.com/stanford-futuredata/ARES/new-dev/data/datasets_v2/nq/nq_unlabeled_output.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-23 00:39:10 ERROR 404: Not Found.

When executing the ues_idp block for the first time a ModuleNotFound is returned. vLLM package is missing, so it has to be installed manually.

In step 2, synthetic dataset generation, document_filepath is expected to be a list, a str is passed. synthetic_queries_filename parameter is incorrect. Correct name is synthetic_queries_filenames and it's of type list, not str.

File nq_few_shot_prompt_for_synthetic_query_generation.tsv under examples only has Query and Document as columns. Running the synthetic dataset generation code returns: KeyError: 'Context_Relevance_Label' as that column is missing from the file.

In Step 3, the route to training dataset should be data/output/synthetic_queries_1.tsv as in the previous code block. Data is missing at the beginning of the path.

Parameter 'training_dataset' for classifier_model is expected to be of type list, received str instead. Parameter 'validation_set' for classifier_model is expected to be of type list, received str instead. Parameter 'label_column' for classifier_model is expected to be of type list, received str instead.

There might be more errors once I am able to run the code, but I've not been able to generate the synthetic dataset using flan because of the incorrect few_shot_file

robbym-dev commented 2 months ago

Hi @elsatch,

Thank you for bringing these issues to our attention. We have updated the README.md and made corrections to the dataset URLs, installation instructions, and code snippets you mentioned. All reported errors should now be resolved.

Please pull the latest changes from the repository and try following the instructions again. If you encounter any further issues, feel free to open a new ticket.

Thanks for your feedback on improving ARES!

Best regards, Robby