Closed tomsing1 closed 3 years ago
As a workaround, I am now escaping the strings specified in the filter_sensitive_data
list, e.g.
# lifted from the Hmisc package
# https://www.rdocumentation.org/packages/Hmisc/versions/4.1-1/topics/escapeRegex
escape_regex <- function(x) {
gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", x)
}
vcr::vcr_configure(
filter_sensitive_data = list(
"SECRET_PASSWORD" = escape_regex(Sys.getenv('SECRET_PASSWORD'))
),
dir = vcr::vcr_test_path("fixtures")
)
Perhaps that's useful for others, in case you prefer not to add fixed = TRUE
to the gsub
calls....
Thanks for reporting this. I'll have a look
@tomsing1 i've made the fix with fixed=TRUE
- can you reinstall remotes::install_github("ropensci/vcr")
and try again?
Thanks a lot for looking into this so quickly @sckott ! I installed the latest version and can confirm that this fixes the issue I had.
But - and my apologies if this sounds like I am making things unnecessarily complicated - perhaps this change would break other users' setups?
For example, let's say somebody is actually taking advantage of the regex functionality right now, e.g. to remove a random authentication token:
invisible(vcr::vcr_configure(
filter_sensitive_data = list(
"<<TOKEN>>" = '\\{"token":".*"\\}'
),
dir = vcr::vcr_test_path("fixtures")
))
In this case the original default behavior is really helpful. But setting the fixed=TRUE
argument in the gsub
function calls, this use case would now not be supported any more, e.g. it could break existing user code.
Perhaps exposing fixed=FALSE
as an argument to the filter_sensitive_data
function and then passing it through to the two gsub
calls would be safer? (Setting it to FALSE
by default would make this change backward compatible, I believe.)
Good point that it would break in a use case where someone wants to use regex to match a password that is not a fixed string. However, the only behavior I meant to support was the one where we use gsub(..., fixed=TRUE)
- and so the code actually follows intended behavior now.
Having said that I guess I could see how it might be useful to use regex matching - but I'm not sure I can myself imagine using it. Even if I issue myself a new API key it is usually another random alphanumeric key and so a regex pattern wouldn't do much.
Can you picture using fixed=FALSE
yourself?
Thanks a lot for your quick reply. Great to hear that the fixed
use case was what you had in mind.
The "<<TOKEN>>" = '\\{"token":".*"\\}'
example is actually something I am using right now. It strips out any alphanumeric pattern preceded by the token:
key. Not sure if I am the only one who thinks that's useful, though!
Okay, well 1 regex user suggests there's probably more users out there. I'm not sure yet how we'll do it, one of:
vcr_configure
parameter, e.g, filter_sensitive_data_regex
so that we have filter_sensitive_data
and filter_sensitive_data_regex
where the former uses fixed=TRUE
and the latter uses fixed=FALSE
filter_sensitive_data
) but change it to allow the following input: vcr_configure(
filter_sensitive_data = list(
exact = list("<<SOME-TOKEN>>" = Sys.getenv("SOME_KEY")),
regex = list("<<REGEX-TOKEN>>" = '\\{"token":".*"\\}')
)
)
@tomsing1 Reinstall to get changes. I've added new config option filter_sensitive_data_regex
. One final thing left is this:
With regex replacement, we can't put back the sensitive string unless we capture it and store it somewhere on disk if we want to be able to replace the real string between R sessions. This seems like an added security risk, but is it worth doing? Or do we just say if you use regex string replacement your cassette will not be able to have the real string?
bump @tomsing1
Thanks for the bump - will get back to you as soon as I can!
I finally go to testing the updates you made, apologies for the delay.
Initially, I installed the HEAD version (commit) - but that version doesn't record any cassettes on my system (Mac OS X), e.g. it does not create YAML files. More specifically, this commit seems to have broken the recording function. Please let me know if you'd like me to provide more details (or create another issue).
The output of the vcr::vcr_test_path("fixtures")
command has changed between the two commits:
25c2bfc
: tests/fixtures
2e07d00
: /Users/my_username/repositories/my_package/tests/fixtures
Both of them point to the same directory on my system when my current directory is the RStudio project of my R package. So I am not sure what's going on.
sensitive_remove
To focus on this issue, e.g. using regular expressions in the sensitive_remove
function I installed commit 25c2bfc, the latest version that actually records cassettes on my system.
It works great:
filter_sensitive_data
argument now accepts patterns with non alphanumeric patterns without problems, and doesn't interpret them as regular expressions any more. filter_sensitive_data_regex
argument accepts my "token-matching" regular expression and removes it.Thanks for pointing out that patterns matched via regular expressions cannot by put back to avoid storing them on disk. That works for me!
In my use case, I receive tokens after the first round of authentication with a username / password. Afterward, the token is used for subsequent API calls. But because recorded cassettes don't actually make calls, the token is not used (and I set the corresponding environmental variable to foobar
during testing, following your example from the README. Does that sound right to you?
Thanks again for enabling both fixed and regex patterns!
One more thing I just noticed (still working with commit 25c2bfc): I had to manually adjust the string in the YAML file, because the pattern was not saved as JSON - and that seems to break the parser:
I changed the line
string: 'token'
to
string: "{\"token\": \"foobar\"}"
to avoid the following error:
Error: lexical error: invalid string in json text.
foobar
(right here) ------^
Backtrace:
1. vcr::use_cassette(...) test-get_token.R:10:2
3. my_application::get_token(...) test-get_token.R:11:4
4. httr::content(r, encoding = "UTF-8") /Users/tomsing1/repositories/my_application/R/authentication.R:60:2
5. httr:::parse_auto(raw, type, encoding, ...)
6. httr:::parser(...)
7. jsonlite::fromJSON(...)
8. jsonlite:::parse_and_simplify(...)
9. jsonlite:::parseJSON(txt, bigint_as_char)
10. jsonlite:::parse_string(txt, bigint_as_char)
Thanks for testing!
For vcr::vcr_test_path
, I'll have a look - and please do open a new issue
Great to hear that the filter params are working.
Does that sound right to you?
yes, sounds right
For the string not saved as JSON, can you give more detail? What does the use_cassette call look like?
please do open a new issue
no need, see #228
https://github.com/ropensci/vcr/blob/13e4bc1ea8b4cc96aaaf89714a7b30301b0a725e/R/filter_sensitive_data.R#L13
I ran into the following error when I tried to hide a password via the
filter_sensitive_data
argument of thevcr_configure
configure function:It seems that the presence of the
{
character is throwing off R'sgsub
function within vcr'ssensitive_remove
function.This reproduces the error:
Adding the
fixed=TRUE
argument to thegsub
call fixes it:In my case, the passwords I need to remove may contain parentheses (and other characters that might throw off
gsub
), so addingfixed=TRUE
seems like a useful change to prevent it from interpreting the pattern as a regular expression.On the other hand, if you actually intended gsub to scan for and remove regular expressions, then perhaps that needs to be documented?
Thanks a ton for writing & sharing this great package!