htcondor / htmap

High-Throughput Computing in Python, powered by HTCondor
https://htmap.readthedocs.io
Apache License 2.0
32 stars 10 forks source link

Detect Python/package version mismatches #231

Open JoshKarpel opened 4 years ago

JoshKarpel commented 4 years ago

@JoshKarpel - can you open a separate ticket to provide a Loud Warning when there's a version mismatch for Python? Seems like something we could detect easily.

Originally posted by @bbockelm in https://github.com/htcondor/htmap/issues/229#issuecomment-683291640

JoshKarpel commented 4 years ago

The answer is yes, but it's not easy with the current way we do IO.

Right now, all IO from the job is handled with (cloud)pickles. This was easy to implement, and so far we haven't needed any richer communication, so it's been fine. But the longer that HTMap is around, the more incompatibilities we're going to pick up between Python versions, particularly whenever the pickle protocol gets a bump (this is the problem that the user hit in #229, trying to go from 3.7 to 3.8). It's also occasionally been problematic in the past when cloudpickle or user packages aren't installed execute-side, e.g. #194 .

The solution to this is to redo the input and output formats so that we can send arbitrary, structured data and metadata back and forth. I recommend JSON with (cloud)pickled objects for function inputs and outputs. Python and package versions could easily be stored as plain text inside the JSON, as well as whatever other metadata we want to add (we could do our own runtime tracking, for example). JSON is readable with the Python standard library and is not versioned, so we shouldn't hit any compatibility issues when loading it from mismatched versions.

For example, the input JSON might look like

{
  "args": <pickled args tuple as bytes>,
  "kwargs": <pickled kwargs dict as bytes>,
  "python_version": "3.8.0",
  "package_versions": {"numpy": "1.18.1", "scipy": "1.0.1"},
}

And the output JSON might look like

{
  "output": <pickled return value bytes>,
  "python_version": "3.7.6",
  "package_versions": {"numpy": "1.18.1", "scipy": "1.0.1"},
}

A version mismatch warning could then be generated when loading output by comparing the local copy of the input file and the output file we got back from the job.

I didn't include the function in the input JSON above, since we get a nominal disk space savings by only storing the pickled function on disk once. The tradeoff is that we have to transfer two files to the job, one of them very small. Consider packing the function in to the input JSON as well.