lanl / Pavilion

HPC testing harness
BSD 3-Clause "New" or "Revised" License
16 stars 12 forks source link

Check for bad inherited yaml files before launching tests #20

Open cadejager opened 8 years ago

cadejager commented 8 years ago

Some YAML files will not run even though they pass basic YAML-lint testers. We need some in-house scripts to verify that a YAML file is correct for Pavilion purposes. One example is that a master YAML file containing a slew of tests to run should verify that each sub-YAML file has unique test identifier keys. (David G. has a python script for this right now)

cadejager commented 8 years ago

Here is David's script:

#! /usr/bin/env python
# This should work with Python 2.7.9 and above.

import os
import sys
import yaml
import collections

def printUsage():
   print('\nUsage: unique_check <filename>\n')
   sys.exit(0)

# Get the command line parameters (should be the master YAML filename)
if len(sys.argv) < 2 or len(sys.argv) > 2:
   print('Error: One argument should be provided.')
   printUsage()
else:
   masterFile = sys.argv[1]

# test to see that the file exists
if not os.path.isfile( masterFile ):
   print('Error: ' + masterFile + ' does not exist.')
   sys.exit(0)

# Open the master YAML file and get a list of files containing tests
f = open( masterFile )
fileDict = yaml.safe_load(f)
f.close()

# Get a list of files included in the master test YAML file
fileList = fileDict['IncludeTestSuite']

# See if a file has been included twice. This is technically not a problem
# since the second pass through a file will replace any previous entries
# from that file, but it could save time to not have duplicates and it will
# leasd to elss confusion if a filename changes and the master file needs
# to be updated.
duplicateFnames = [item for item, count in collections.Counter(fileList).items() if count > 1]
# If there were duplicates, print a warning but keep processing
if len(duplicateFnames) > 0:
   print('Warning: The following files are listed more than once in ' + masterFile)
   print( ', '.join(duplicateFnames) )
   # Remove the duplicate entries so there are no false positives in 
   # the processing to follow
   # set() returns unique items in a list
   fileList = list(set(fileList))

# uniqueIDs will be a dictionary in which each key will be the name
# of a unique test ID and the corresponding value will be a list of
# YAML filenames containing that ID. If there is more than one filename
# per unique ID we have namespace collision
uniqueIDs = dict()

# open each test file and get a list of test IDs from them
for fname in fileList:
   # Just in case, check that the file in the master file still exists
   if not os.path.isfile( fname ):
      print('Error: ' + fname + ' listed in ' + masterFile + ' does not exist.')
      print('Processing ending.')
      sys.exit(0)
   f = open(fname)
   # Load the entire YAML file as a dictionary
   testDict = yaml.safe_load(f)
   f.close()
   # Pick-off the main ID strings (first level dictionary keys in Python)
   testKeys = testDict.keys()
   # For each key found in the file, see if we have come across it before and
   # if not, add it; If so, append the filename to its list of locations.
   for iKey in testKeys:
      if iKey in uniqueIDs.keys():
         uniqueIDs[iKey].append( fname )
      else:
         uniqueIDs[iKey] = [ fname ]

# Now go through the dictionary of unique test IDs and look for errors
# (mulitple entries)
ErrorsFound = False
for testID, filenames in uniqueIDs.items():
   if len( filenames ) > 1:
      ErrorsFound = True
      print( 'Error: ' + str( testID ) )
      print( '  Found in files: ' + str( filenames ) )

if not ErrorsFound:
   print('Finished parsing master YAML file. No namespace errors found.')