tuwien-cms / xprec

Full quadruple precision (double-double) data type for numpy
MIT License
12 stars 4 forks source link

Upload binary wheels #5

Closed shinaoka closed 2 years ago

shinaoka commented 2 years ago

This PR includes changes for automatic build and upload of binary wheels. This is triggered by the addition of a new tag. This PR is independent of #4.

Each time a tag is created, binary wheels for the following platforms are built and upload to TestPyPI (for testing):

For a while, binary wheels are NOT uploaded to PyPI safety. This can be changed later by changing the last few lines in wheels.yml

          password: ${{ secrets.TEST_PYPI_API_TOKEN }}
          skip_existing: true
          ########################
          repository_url: https://test.pypi.org/legacy/

as

          password: ${{ secrets.PYPI_API_TOKEN }}
          skip_existing: true

.

The correctness of this change can be verify by bumping version.

shinaoka commented 2 years ago

Oh, I got the point. -march=native is set manually in setup.py! https://github.com/tuwien-cms/xprec/blob/mainline/setup.py#L108

What is the cleanest way to remove this flag? Shall we set an environment variable such as XPREC_NO_MARCH_NAIVE=1?

shinaoka commented 2 years ago

This issue is only about building manylinux wheels. Whenever xprec is built inside a container, the -march=native flag should be off for safety. The built image could be uploaded to DockerHub. How large is the impact on performance if the flag is disabled by default?

Detecting if we are inside a container seems to be a tricky business. https://stackoverflow.com/questions/20010199/how-to-determine-if-a-process-runs-inside-lxc-docker

mwallerb commented 2 years ago

The main problem is the use of the FMA instruction here, which is crucial for good multiplication performance: https://github.com/tuwien-cms/xprec/blob/8311514e737d974e795108391383933cb198080a/csrc/dd_arith.h#L49

I would expect a 10% performance hit or so when this is emulated, depending on workload, but I have not measured it.

It may be possible to detect the FMA instruction at runtime, perhaps by building two different libraries and deciding at runtime which to load based on the CPU capabilities...

For now I would just remove the flag from setup.py unconditionally and then merge. It is more important that people can actually install the library easily than squeezing out the last bit of performance.

shinaoka commented 2 years ago

How about turning off FMA and the flag by default? We could add a new environment variable to turn on them.

mwallerb commented 2 years ago

Yes, precisely, that's what I meant ... so if you could remove the line from setup.py, then we can merge this.

shinaoka commented 2 years ago

These flags have been disabled for now! Good night...