Open fmh1art opened 4 days ago
Hi @fmh1art, if you execute the testdata/download_testdata.sh
script (found here) this will download a set of datasets which we used in our CIDR paper. If you then execute the register-sources.sh
script (found here) this will register those datasets with your local instance of PZ.
Once you have done that, the enron-eval
dataset should have 1000 data instances (each instance is an email), whose labels can be found in testdata/groundtruth/enron-eval.csv
. A label of 1
indicates that the email references fraudulent activity and a label of 0
indicates that it does not. Please let me know if this dataset is not suitable for your needs, and I'd be happy to suggest other options!
Thanks for your detailed reply! I am sorry I did not clarify my needs: I have already download all datasets you provided, but for some complex tasks like real-estate-eval, I find the dataset is incomplete (with only 30 data instances). If I want to evaluate on larger dataset with more complicated multimodal data, where can I find these datasets?
Hi @fmh1art, if you execute the
testdata/download_testdata.sh
script (found here) this will download a set of datasets which we used in our CIDR paper. If you then execute theregister-sources.sh
script (found here) this will register those datasets with your local instance of PZ.Once you have done that, the
enron-eval
dataset should have 1000 data instances (each instance is an email), whose labels can be found intestdata/groundtruth/enron-eval.csv
. A label of1
indicates that the email references fraudulent activity and a label of0
indicates that it does not. Please let me know if this dataset is not suitable for your needs, and I'd be happy to suggest other options!
Hi @fmh1art, our mistake for not uploading the dataset with all 100 real estate listings -- I've just uploaded it to the following location: https://palimpzest-workloads.s3.us-east-1.amazonaws.com/real-estate-eval-100.tar
; you should be able to download the tar file with:
$ wget https://palimpzest-workloads.s3.us-east-1.amazonaws.com/real-estate-eval-100.tar
Hi, I am very exciting about the application scenario of Palimpzest and want to do more research on it. However, I have not found an appropriate dataset that support analysis over hundreds or thousands of data instances. Could you please share some urls of these datasets or provide a specific research direction where I may find relevant useful datasets?