Open dkliban opened 2 years ago
I have collected some statistics by running the tests while using my oci-env SFTP storage profile, the individual runs are numbered, and there are always 3 test cases for every run (unless the previous one had to be interrupted by a signal since it was stuck) - if only some cases are listed for a specific run, it means that the others have passed. Please note that in the nightly CI, there is usually just the file corruption error (file1.body != file2.body), while here there are very commonly getaway timeout errors (504) which may be a direct cause of the corruption error, or they may be some other bug specific to the oci-env profile itself.
ALL PASSED
on_demand STUCK - ctrl + c
on_demand failed - aiohttp except Response payload is not completed - SFTPError("Garbage packet received")
immediate failed - error 502 at line 138 download_file - SFTPError("Garbage packet received") on_demand failed - error 504 - SFTPError("Garbage packet received") streamed failed - assert 504 == 404 (http) - no traceback
on_demand STUCK - ctrl + c
on_demand STUCK - ctrl + c
ALL PASSED
ALL PASSED
ALL PASSED
on_demand STUCK - ctrl + c
ALL PASSED
immediate failed - error 504 timeout - no traceback on_demand failed - assert body1 == body2 failed - no traceback
ALL PASSED
immediate - 504 timeout - no traceback on_demand - assert body1 == body2 failed - no traceback
on_demand STUCK - ctrl + c
immediate - failed line 153 response payload not completed - no traceback on_demand - failed line 138 response payload not completed - no traceback streamed - line 108 assert failed 504 == 404 is false - no traceback
on_demand stuck - ctrl + c
ALL PASSED
ALL PASSED
ALL PASSED
on_demand failed - body1 == body2 false - ? failed to get journal cursor ?
on_demand failed - line 173 response payload not completed - no traceback streamed failed - assert 404 == 504 failed - no traceback
immediate - line 128 error 504 timeout - no traceback on_demand - assert 504 == 404 failed - no traceback streamed - assert 504 == 404 failed - no traceback
on_demand failed - line 153 error 504 timeout - no traceback
immediate - line 128 error 504 timeout - no traceback on_demand - assert 504 == 404 failed - no traceback streamed - line 138 error 504 timeout - no traceback
immediate - line 128 error 504 timeout - no traceback on_demand - line 128 error 504 timeout - no traceback streamed - assert 504 == 404 failed - no traceback
immediate - assert 504 == 404 failed - no traceback on_demand - assert 504 == 404 failed - no traceback streamed - assert 504 == 404 failed - no traceback
immediate - assert 504 == 404 failed - no traceback on_demand - assert 504 == 404 failed - no traceback
So far we've been testing with SFTP storage and seeing failures there. And it is not clear whether the issue is on Pulp side or maybe SFTP itself.
ArtifactResponse
is used to stream the data when REDIRECT_TO_OBJECT_STORAGE is set to False with object storage too. Can this be tested( for the sake of narrowing down the the issue only) with some other storage, like s3 or azure and see whether the issue is persistent there too?
For example, run the tests in the setup where you have pulp_file+pulpcore with REDIRECT_TO_OBJECT_STORAGE =False and s3 storage.
The test_download_policy tests fail intermittently for the 'streamed' tests in CI. These tests assure that pulp works with a storage backend such as sftp server. It's possible we just need to add
asyncio.shield()
around here[0].[0] https://github.com/pulp/pulpcore/blob/main/pulpcore/responses.py#L154