eapearson commented 1 year ago

PR for PTV-1888

The goal of PTV-1888 is to add development tools to the staging service, in preparation for a future PR (which will follow on the heels of this one) which fixes and improves support for file uploads of up to 5GB.

This PR contains the following changes:

development tools, located in ./development
a devcontainer usable with Visual Studio Code located in ./devcontainer
Splitting Python dependencies into requirements.txt for the deployment runtime, and requirements-dev.txt containing additional dependencies for development and testing.
Updates all dependencies - base image, base image dependencies, and Python dependencies
Fixed a couple of unreliable tests
Added markdownlint support, and applied markdownlint fixes to all markdown files, as I was working on documentation and need to to be "clean".
Extracted and fixed API documentation, simplifying the top-level README.md.

Development Tools

The development tools are designed to be run in a docker container, which may be a bespoke container defined ./development/Dockerfile and controlled by a script ./development/scripts/run, or may be the aforementioned devcontainer.

These tools include mypy, black, isort, and the pytest-based tests.

`devcontainer`

Devcontainers are a relatively recent addition to the developer toolkit. Visual Studio Code (VSC) is already supported by this repo, so this can be seen as an addition to that support. The devcontainer has its own Dockerfile. For those unfamiliar with devcontainers, it is essentially a method for running VSC inside a container, and that container is configured to support the programming language and environment of choice. This is a very effective technique, as it replicates the precise system, system dependencies, python version, and python dependencies.

Splitting Requirements

As the number of development support tools expands to cover current practices, the entanglement of runtime and development-time dependencies becomes more serious. Therefore, the dependencies were simply split into "requirements.txt" and "requirements-dev.txt". The production image installs only "requirements.txt", and the devcontainer and development tools images also install the requirements-dev.txt.

Update Dependencies

It is always good to take the opportunity to ensure that dependencies are up-to-date, to reduce the diff to dependencies in subsequence pull requests. In this case it was triggered by two other changes - the need to add new dependencies to support development tools, and the need to update the images due to a problem with the current Dockerfiles.

There is nothing else notable about the dependency updates.

Dockerfile Update

I can briefly describe the Dockerfile issue. It appears that the Python base image dropped support for recent Python versions and the Debian "buster" releases. The effect of this was that the image would no longer build. So the Dockerfile was updated to use the base image python:3.11.5-slim-bookworm. This is a minor Python version bump from 3.11.4, a shift from Debian buster to bookworm, and an update of all the OS-level dependencies installed in the Dockerfile.

Fixed Tests

There were two types of test fixes required.

First, the tests had used an small patch for running hypothesis in asynchronous tests. Some tests seemed to occasionally fail when tearing down the file system utility used to prepare a directory for tests to use. There seemed to be a race condition, and sometimes a file which was assumed to be deleted was not, and the operation to remove a directory used for test data would fail. I had assumed this was in the "custom" async adapter for the hypothesis library. It was during that investigation that I discovered that the custom adapter was no longer necessary, as hypothesis now works fine with asyncio-based tests. However, after this change, the test failures persisted, and I determined that there should not be any race conditions, as the async tests should be sequenced (although there is always a doubt that concurrency will creep in). The next conjecture was that it was due to the interaction between the async client mock and the file utility. It ends up that the async client mock needs should be shut down before the file utility cleanup code, and thus the async client should be wrapped in the file utility . Applying this change to all such tests fixed the problem.

The second test problem was due to a time comparison for the modifiction type of file metadata. The test asserts that the when creates a file, the modification type should be earlier than any time measurement taken afterwards. However, sometimes this assertion failed! The reason? It is possible that the modification time obtained for a file may have a rough precision - in this case it was 1 second, although research showed that in windows it may be 2 seconds. In such a case, the modification time may be rounded to the nearest second, which may end up being the next second. Since the elapsed wall time between the file creation and the test code that generates the time comparison is so short, sub-second for sure, it is possible for the file creation time to be greater than the comparison time. This is not breaking the laws of sequential programming - it is just due to the modification time being rounded to the next second.

The solution was to only look at the absolute difference between the two times, and ensure that it is less than 1 second.

BTW this file modification precision issue was noticed only on my host machine - it did not replicate on GHA for example. This is probably due to the underling virtual machine used on a Rancher Desktop on macOS.

Markdown Work

To support the new development tools, I added additional documentation. Since I now relied on being able to edit markdown, I also added markdownlint support. This in turn made it convenient fix the markdown linting issues in all markdown files. There are not many, so it did not take much time.

Markdown linting is not integrated into the development toolchain, just used environmentally in VSC. Support is in the form of a VSC plugin in the devocontainer configuration, and a .markdownlint.yaml configuration file.

Extracted API Documentation

More time-consuming was extracting the API documention from the top-level README.md. I was editing the top level README in order to reflect the new development tools, and found that the length of the file was onerous to deal with - it was over 1K lines. And the documentation format was difficult to read, and did not follow any kind of standard format. I hope the improvements I made will improve that situation, and I didn't quite complete it as some of the later code examples are somewhat difficult to follow. I did reformat them somewhat, but they are still cumbersome to read.

eapearson commented 1 year ago

Hi @bio-boris, I'm working through the code quality reports, and one big area is markdown linting issues. These are reported by codacy using "remark-lint". A few years ago they switched to "markdownlint" as the preferred markdown linting tool. Markdownlint is a much more popular tool than remark, as well. Would you mind switching to markdownlint in the codacy configuration? I can see the setting, but don't have admin access to be able to change it. Thanks!

eapearson commented 1 year ago

Also, the sonarcloud error report for running a container as root. AFAIK this is a requirement for running a container in GHA (some people have tried to have a non-root user in the container match whatever user is used in GHA, but as far as I can tell this is not officially supported). I wonder if sonarcloud can be configured with an exception for this?

bio-boris commented 1 year ago

I have changed it

eapearson commented 1 year ago

Thanks @bio-boris

bio-boris commented 1 year ago

The SonarCloud is not a strict requirement for passing a PR review. It is for informational purposes to help us see if we did something wrong. I have marked it as Reviewed and now it is passing.

bio-boris commented 1 year ago

I have triggered a re-run of codacy and it now passes. However, I noticed that you are using requirements.txt and requirements-dev Maybe we should instead get rid of requirements alltogether and switch to the PipFile? In the pipfile we can have both regular and dev dependencies