xDimGG / node-steamapi

An object-oriented Steam API wrapper for Node.js developers.
https://www.npmjs.com/package/steamapi
177 stars 43 forks source link

[RegExps] Regular Expressions permit invalid strings #46

Closed WillsterJohnson closed 1 year ago

WillsterJohnson commented 1 year ago

I'm happy to write up PRs for this and any issue I add here, just let me know and I'll get to work

Specifically, reProfileURL and reProfileID.

I've created a test for the reProfileURL regex, as well as some potential alternatives.

The processed results are as follows;

{
  "reProfileURL": {
    "falsePositives": 27,
    "falseNegatives": 0
  },
  "minimalist_same_behavior": {
    "falsePositives": 27,
    "falseNegatives": 0
  },
  "minimalist_fixed_id_length": {
    "falsePositives": 9,
    "falseNegatives": 0
  },
  "strict_fixed": {
    "falsePositives": 0,
    "falseNegatives": 0
  }
}

Checking for false negatives here ensures that the regular expressions don't over-restrict what they will and will not match (this in fact was crucial in creating the fixed id length and strict RegExps).

The test includes 18 permutations which ought to pass, and 29 which ought to fail.

Expected passes;

Expected fails;

The strict_fixed regex with no false flags is as follows;

/^(?:(?:(?:(?:https?)?:\/\/)?(?:www\.)?steamcommunity\.com)?)?\/?profiles\/(\d{17})(?:(?:\?|\/).*)?$/i;
xDimGG commented 1 year ago

I appreciate the investigation into this, but the regexes were designed to be very lenient. As long as the resolve method (https://github.com/xDimGG/node-steamapi/blob/master/src/SteamAPI.js#L78) is returning the correct ID, the regexes are doing their job correctly. Basically, false positives are OK, and false negatives are not.

My reasoning behind this: suppose a user doesn't copy their entire profile URL and just copies a chunk of it. As long as they have the important part (the community ID or vanity URL) copied, they can still find their profile. This is the behavior I was going for rather than being restrictive.

WillsterJohnson commented 1 year ago

That's a fair enough reason for me. In that case then the minimalist_same_behavior RegExp would be most permissive, though for this use case it ultimately makes no difference.