Cannot figure out how to download the dataset.

shahbuland commented 1 year ago

I've been following the download instructions in the readme. I did not download off zenodo since the id files are already present for me. When I run the script for downloading the reddit dataset from the read me, I get the following output. Nothing downloads and it completes instantly.

python main.py --download_data --dataset_name Reddit
Scraping all data may take a while (several hours)
[]
Downloading comments of 0 submission files
Getting images for:
 []

I tried creating the directory this seemed to be trying to use: data/RPCD/photocritique/, and put the id.csv files in there. This didn't work either, and gave me the following error:

AttributeError: 'Series' object has no attribute 'id'

From the code in reddit_scraper.py:

for idx, submission in tqdm(submissions_df.iterrows(), total=submissions_df.shape[0]):
            submission_id = submission.id
            submission_comments_csv_path = submission_id + '-comments.csv'
            submission_comments_path = os.path.join(commentspath, submission_comments_csv_path)
            if os.path.exists(submission_comments_path):
                continue

Am I missing some files or args?

DhruvBhardwaj commented 1 year ago

Facing the exact same issue.

dveni commented 1 year ago

Hi!

Thanks for bringing this to our attention, there were some errors in the code. Last commit should work well (bc91c440bf7859b483f03e1cae14f7013576d6a9), I've tested the downloader with a small subset of the dataset and it seems to work properly.

Apologies for the inconvenience! And please let us know if you find any other issue :slightly_smiling_face:

mediatechnologycenter / Aestheval

Cannot figure out how to download the dataset. #9