The report-sproxyd-keys:basic does not handle timeouts in requests properly

The report_sproxyd.py when listbuckets() is run has a 30 second timeout. When this timeout occurs and bucketd does not respond the listbucket() operation does not handle this, and breaks out of the while loop because there is no error response of 500 or 404. This appears to them move onto the next bucket and have no handling to ensure each bucket gets listed properly.

Error:

File "/home/s3/report_sproxyd.py", line 176 in run
session, key, versionid)
File "/home/s3/report_sproxyd.py", line 94, in listbucket
r = session.get(url, timeout=30, verify=False)

<SNIP>
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool )hots='127.0.0.1', port=9000) : Read timed out. (read timeout=30)

The block for line 176 is:

    def run(self):
        """main function to be passed to the Threading class"""
        total_size = 0                          
        files = 0               
        payload = True
        key = ""      
        versionid = ""
        while payload:                         
            while 1:
                session = requests.Session()
                session.headers.update({"x-scal-request-uids":"utapi-reindex-buckets"})
                error, payload = self.listbucket(
                    session, key, versionid)
                if error == 500:                 
                    time.sleep(15)                                    
                else:
                    break
            if error == 404:
                break
            key, skeys, versionid = self.retkeys(payload)
            self.skeyg += skeys   
        return(self.userid, self.bucket, self.skeyg)

The block for line 94 is:

    def listbucket(self, session=None, marker="", versionmarker=""):
        """function to list the contains of the buckets"""
        m = marker.encode('utf8')              
        mark = urllib.parse.quote(m)
        params = "%s?listingType=Basic&maxKeys=1000&gt=%s" % (
            self.bucket, mark)                                                         
        url = "%s/default/bucket/%s" % (self._bucketd, params)
        r = session.get(url, timeout=30, verify=False)
        if r.status_code == 200:                 
            r.encoding = 'utf-8'                                      
            payload = json.loads(r.text)
            return (r.status_code, payload)
        else:               
            return (r.status_code, "")

The issue can be observed by comparing the content of the keys.txt output file and the total number of buckets in s3api:

# awk -F';' '{print $1}' clean_np_keys.txt| sort -u | wc -l
56
# listbuckets=$(aws --endpoint-url=https://obs-are.allstate.com s3api list-buckets --query "Buckets[].Name" | sort -u | wc -l) ; echo $(( ${listbuckets} - 2 ))
83

The quantity of buckets from s3api shows 83 buckets. The quantity of buckets from report-sproxyd-keys:basic only contains 56 buckets.

What appears to be causing this issue is empty S3 buckets. This appears to cause the listing keys of the bucket to fail.
Any bucket that is not empty but also exhibits a 30 second timeout in bucketd would lead to:
- The listkeys.csv data would have all keys from all nodes
- The report-sproxyd-keys:basic keys.txt data would not contain keys for buckets that timeout during listbucket().
- Resulting in:
- The P0/P1 scripts will perform the left anti join finding objects in listkeys.csv but not an associated key in bucketd. This will lead to the P4 script considering all keys where the bucketd timeout occured as S3 orphans and attempt to delete them.

scality / spark

The report-sproxyd-keys:basic does not handle timeouts in requests properly #7