Open Woffee opened 1 year ago
@Woffee I was running into similar issue and through some debugging I was able to find the cause of the error: Some of fields are missing in the downloaded dumpstatus.json
but the PydanticModel makes the assumption that all fields are present. I was able to fix it by updating the following sections of code in wikidata_dump.py
:
class _WikidataDumpStatusFile(PydanticModel):
size: int = 0
url: str = ""
md5: str = ""
sha1: str = ""
class _WikidataDumpStatusJob(PydanticModel):
status: str
updated: datetime
files: Mapping[str, _WikidataDumpStatusFile] = None
@validator("updated", pre=True)
def _parse_datetime(cls, value: str) -> datetime: # noqa: N805
value = value.strip()
if len(value) == 0:
return datetime.min
return datetime.strptime(value, "%Y-%m-%d %H:%M:%S")
class _WikidataDumpStatus(PydanticModel):
jobs: Mapping[str, _WikidataDumpStatusJob]
version: str
@classmethod
def load(cls, dump_dir: Path, version: date, mirror: str) -> _WikidataDumpStatus:
path = dump_dir / f"wikidatawiki-{version:%4Y%2m%2d}-dumpstatus.json"
print("Loading data from path", path)
if not path.exists():
url = f"{mirror}/wikidatawiki/{version:%4Y%2m%2d}/dumpstatus.json"
_LOGGER.debug(f"Downloading Wikidata dump status from '{url}'.")
response = requests.get(url)
response.raise_for_status()
path.parent.mkdir(exist_ok=True, parents=True)
with path.open("w", encoding="UTF-8") as fd:
fd.write(json.dumps(response.json(), indent=2) + "\n")
_LOGGER.debug("Done downloading Wikidata dump status.")
dump_status = _WikidataDumpStatus.parse_file(path)
for job_name, job in dump_status.jobs.items():
if job.status != "done":
path.unlink()
raise Exception(f"Job '{job_name}' is not 'done', but '{job.status}'.")
return dump_status
Note that you still might get an error that the job status is not done because the Wikidata dump hasn't completed creating the dump. Thus I would recommend using an older dump that is completed
Dear author. I tried running the
build_wikidated_v1_0.py
script, but I encountered the following error. Could you help me check what's going wrong?By the way, I made the following two modifications: