Open StefanIsSmart opened 2 months ago
I'm having the same issue
Any ideas?
The problem seems to be that the downloaded zinc.tab file is empty (in my case zinc)
Hi, I have the same issue (same message and an empty .tab file). And when I run it in my terminal I got this : Maybe it is a bad request to https://dataverse.harvard.edu/ ?
If you just want to download the data, directly download from here https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/21LKWG
Hi,
I am seeing the same (misleading) "TDC is hosted in Harvard Dataverse and it is currently under maintenance" message. As @flogrammer and @mxfly14 said, this appears to be due to empty files being retrieved.
The underlying cause (in my environment at least) is due to getting a 202 response instead of 200 when sending a GET request.
Here's the code for the dataverse_download
function (from tdc.utils.load
):
def dataverse_download(url, path, name, types, id=None):
"""dataverse download helper with progress bar
Args:
url (str): the url of the dataset
path (str): the path to save the dataset
name (str): the dataset name
types (dict): a dictionary mapping from the dataset name to the file format
"""
if id is None:
save_path = os.path.join(path, name + "." + types[name])
else:
save_path = os.path.join(path, name + "-" + str(id) + "." + types[name])
response = requests.get(url, stream=True)
total_size_in_bytes = int(response.headers.get("content-length", 0))
block_size = 1024
progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)
with open(save_path, "wb") as file:
for data in response.iter_content(block_size):
progress_bar.update(len(data))
file.write(data)
progress_bar.close()
The 202 status means that response.iter_content()
doesn't generate anything, and the function ends-up writing an empty file.
The 202 status can be simply reproduced like this:
import requests
r = requests.get("https://dataverse.harvard.edu/api/access/datafile/4267146")
print(r.status_code)
202
Strangely, the same behaviour is not observed when running in a Google colab environment (I haven't figured-out why that is yet!).
Kind regards
James
Describe the bug The bug was happened while loading the data
To Reproduce Steps to reproduce the behavior:
from tdc.single_pred import Yields data = Yields(name = 'Buchwald-Hartwig') split = data.get_split()
Expected behavior
get a dataframe
Screenshots
Environment:
Additional context ![Uploading 截屏2024-09-16 上午10.34.25.png…]()