Closed ptgolden closed 2 months ago
Actually, I think the issue is here: https://github.com/monarch-initiative/kghub-downloader/blob/2c0f3d2b2d262e986f4c764bc69516fc5825c260/kghub_downloader/download_utils.py#L209
Instead of opening the file at the same time as the request, it should only be opened when it's ready to be written. I'm happy to open a PR if that sounds okay.
While running
ingest download --all
, I encountered a couple errors. One on my end due to ending the process prematurely, one due to a network disruption. Running the command again would pick up the ingest, but it would count the file being downloaded when the error occurred as cached, instead of attempting to re-download it.To recreate, run
ingest download --all
, and press^C
to send an interrupt to the program. If the script was in the middle of downloading a file, it will appear in thedata/
directory as an empty file.An easy fix would be to delete files when an error occurs here: https://github.com/monarch-initiative/monarch-ingest/blob/24f9de3047e9c762d9d2fb3f757858c948c0b162/src/monarch_ingest/main.py#L44-L52
A (much) more complicated fix would involve supporting partial downloads in monarch-initiative/kghub-downloader.