Insufficient error message for bad keys

ZetaTwo commented 1 year ago

Describe the bug

If one of the keys contains some error, for example the json is invalid or the key is not a valid key, GCP Scanner crashes with a very unhelpful error message.

To Reproduce

Put a few keys in a directory with at least one of them invalid format
Run gcp scanner pointing to that directory

Expected behavior

Either

A warning log telling me exactly which key failed to parse and then continue
An error log telling me exactly which key failed to parse and then stop
An error log telling me exactly which key failed to parse, then continue parsing (for more potential errors) and then stop without scanning (so that you can handle all errors in bulk instead of fixing one and re-running.

Current behavior

GCP Scanner crashes with an error telling me that it failed to parse some key

Bhardwaj-Himanshu commented 1 year ago

Hi @ZetaTwo. that is helpful information, but as I do lack experience, could you elaborate a bit more on which file to start from or how to get started with exact locations or file_names.

Also let me know, if you are working on anything!

ZetaTwo commented 1 year ago

If you follow the reproduction instructions you will see where in the code the error is thrown and you can start looking there.

Bhardwaj-Himanshu commented 1 year ago

Ah okay, I'll give it a shot and update you soon on it! Thanks

csemanish12 commented 1 year ago

@ZetaTwo I noticed that this issue has been idle for some time, and after looking into it, I believe I have identified the root cause and a potential solution. I'm really interested in contributing to this project and would be delighted to take ownership of this issue.

Based on my investigation, it appears that creds = service_account.Credentials.from_service_account_file(file_path) inside the method get_creds_from_file raises exception when the json file is not valid.

A possible solution that i thought of is as follows: In the method get_creds_from_file

first we check if the json file is valid or not and add appropriate error message
if the json file is valid, we can then create validations for all the fields in the json file. We can collect all the keys that are invalid and then add appropriate error message.

From what i could find, we do have some formats that each field in the json file is supposed to follow link: create and delete service account keys


  "type": "service_account",
  "project_id": "PROJECT_ID",
  "private_key_id": "KEY_ID",
  "private_key": "-----BEGIN PRIVATE KEY-----\nPRIVATE_KEY\n-----END PRIVATE KEY-----\n",
  "client_email": "SERVICE_ACCOUNT_EMAIL",
  "client_id": "CLIENT_ID",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/SERVICE_ACCOUNT_EMAIL"
}

Please let me know if i am missing something.

ZetaTwo commented 1 year ago

I think you misunderstand the issue here. We don't need to do more validation of the key. That is already handled by the from_service_account_file function.

The problem is that the scanner stops as soon as it encounters an invalid key and it doesn't tell which key is invalid. It would be better if it tried to parse all keys and then provided one single list of all the keys that could not be parsed.

csemanish12 commented 1 year ago

@ZetaTwo Thanks for the clarification. From what i found, the scanner stops even if the file is not json. We get the exception from the from_service_account_file. I tried a couple of scenarios: Scenario 1: remove double string from key "project_id": "PROJECT_ID" > changed it to "project: "PROJECT_ID" Scenario 2: remove hypen from private key "private_key": "-----BEGIN....." > changed it to ""private_key": "BEGIN"

In both cases, the scanner crashes without proper error message.

This is what i think could work: First we should try to see if the file is json and only then we try to parse the keys to find the problematic ones. For the invalid json file we can inform the user to provde the valid json file and leave it up to the user to provide the correct json file. For key parsing, i thought to use the template to understand, how the key should be. for example: while parsing private_key we make sure the structure of the key is as follows -----BEGIN PRIVATE KEY-----\nPRIVATE_KEY\n-----END PRIVATE KEY-----\n"

We can have a template for how each key should be and based on that we can understand which keys are problematic and can inform the user without having to call the method from_service_account_file

ZetaTwo commented 1 year ago

No there is no need to try to validate the key format. This is completely unnecessary. The problem is how the errors are handled and reported.

csemanish12 commented 1 year ago

@ZetaTwo the method that throws exception is part of google/ouath2 package and not gcp_scanner. I am having trouble understanding the scope of this solution. Where do we want to resolve this issue? Exception raised from from_service_account_file method does provide some insight as to where the problem lies. For example, if the private key is malformed, we get this error raise exceptions.MalformedError("No key could be detected.") do we want to understand from the exception as to which key is related to the exception or is there some other way that i missing?

ZetaTwo commented 1 year ago

No it is enough to know which keyfile has an error. Currently we do something like this (pseudo-Python):

keys = []
for file in directory:
  keys.append(from_service_account_file(file))

but we need to do something like this:

malformed_files = []
keys = []
for file in directory:
  try:
    keys.append(from_service_account_file(file))
  except AppropriateException:
    malformed_files.append(file)

if len(malformed_files) > 0:
  log.error('Failed to parse keyfiles(s):')
  for file in malformed_files:
    log.error('...')
  return False

...

csemanish12 commented 1 year ago

@ZetaTwo I understand now. Thanks for the explanation. This is what i thought when you were talking about parsing all the keys. By "keys", I thought you were talking about parsing individual keys inside inside a single json file. "private_key_id": "KEY_ID", "private_key": "-----BEGIN PRIVATE KEY-----\nPRIVATE_KEY\n-----END PRIVATE KEY-----\n", "client_email": "SERVICE_ACCOUNT_EMAIL", Now i understand that we want to parse all the keyfiles (keyfile1.json, keyfile2.json and so on).
I can work on the fix if you want.

ZetaTwo commented 1 year ago

Yes, it's enough to know which file caused the error at this point. Knowing exactly which key is a separate issue which could potentially be interesting as well but it is not as important.

csemanish12 commented 1 year ago

Okey. I will start working on the fix. Could you please assign me this issue? @ZetaTwo

ZetaTwo commented 1 year ago

Great. We don't really assign issues though so just submit a PR when you have a solution.

csemanish12 commented 1 year ago

@ZetaTwo I have created a pull request for the fix. Looking forward to feedback.

google / gcp_scanner

Insufficient error message for bad keys #141