"Find in Files" doesn't find all occurrences

fweber144 commented 2 years ago

System details

RStudio Edition : Desktop
RStudio Version : RStudio 2022.07.1+554 "Spotted Wakerobin" Release (7872775ebddc40635780ca1ed238934c3345c5de, 2022-07-22) for Windows; Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
OS Version      : Windows 10 x64 (build 19044)
R Version       : 4.2.1 (2022-06-23 ucrt)

Steps to reproduce the problem

Download a copy of my projpred fork from https://github.com/fweber144/projpred/archive/ddaf3e976c871e083d22d0381f82d3d2414abeef.zip (I set up the link to contain the hash of the most recent commit at the time of writing to keep this reproducible when more commits are pushed).
Extract the downloaded ZIP file, go to the unzipped directory projpred-ddaf3e976c871e083d22d0381f82d3d2414abeef.
Open projpred.Rproj.
Press Ctrl + Shift + F for launching the "Find in Files" dialog.
Type d_test into the input field under "Find:".
Select the "Case sensitive" checkbox.
Select "Common source files [...]" under "Search these files:".
Click on "Find".

Now, on my machine, the "Find in Files" tab (in the console pane) stops (at its bottom) at line 102 of file tests/testthat/test_varsel.R. However, if you open that file and search for d_test (only in that file, using the smaller Ctrl + F search bar), then you'll quickly see that there are more occurrences, e.g., in line 109.

Describe the problem in detail

The "Find in Files" search doesn't find all occurrences.

Describe the behavior you expected

I would have expected all occurrences to be listed in the "Find in Files" results of the console pane, in particular, the occurrence of d_test in line 109 of file tests/testthat/test_varsel.R. If the results were limited by a maximum number of displayed occurrences (I think 1000 is the maximum), I would have expected a red line at the bottom of the "Find in Files" results saying that there were more occurrences than those that are shown.

[x] I have read the guide for submitting good bug reports.
[x] I have installed the latest version of RStudio, and confirmed that the issue still persists.
[x] If I am reporting an RStudio crash, I have included a diagnostics report.
[x] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.

fweber144 commented 2 years ago

This issue does not occur on a different machine with:

RStudio Edition : Desktop
RStudio Version : RStudio 2022.07.1+554 "Spotted Wakerobin" Release (7872775ebddc40635780ca1ed238934c3345c5de, 2022-07-22) for Ubuntu Jammy; Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
OS Version      : Ubuntu 22.04.1 LTS
R Version       : 4.2.1 (2022-06-23)

However, on a different Windows machine (different from the one in the initial post), it is reproducible. In contrast to the machine from the initial post, that other machine has Windows 11. So this seems to be a Windows-related bug, irrespective of whether it's Windows 10 or Windows 11. In any case, I think it's crucial because I use the "Find in Files" dialog a lot for navigating around in R packages.

ronblum commented 2 years ago

@fweber144 Thank you for raising this! I'm unable to reproduce the problem though, using Windows 11. I tried a few different versions and combinations, but it works. So, a couple of questions:

1) Do the records show up correctly for other files? For example, are the records in testers.R showing up correctly? 2) Have you used earlier versions of RStudio Desktop, and if so was it working in a previous version? 3) What happens if you uncheck "case sensitive"? 4) Similarly, what happens if you look at all files instead of only source files?

fweber144 commented 2 years ago

Thank you for investigating this, @ronblum. Concerning your questions:

Do the records show up correctly for other files? For example, are the records in testers.R showing up correctly?

Yes, for other files up in the lists, the results seem to be correct. However, it seems like the results list is "cut off" at the bottom (see my reply to point 4 below). So for files which would be further down the list, results are not shown (by coincidence, this folder does not have an alphabetically later file which contains d_test, but if I put d_test at the end of line 1 of vignettes/projpred.Rmd for example, this occurrence is not shown).

Have you used earlier versions of RStudio Desktop, and if so was it working in a previous version?

I think it was working correctly at some point in the past (because I'm using "Find in Files" a lot and never encountered any issue, but I might just not have realized this). But even if it was working correctly in the past, I can't say when this changed—sorry.

What happens if you uncheck "case sensitive"?

No changes, the issue still persists. However, this does have an impact when looking at all files (not just source files), see point 4 below.

Similarly, what happens if you look at all files instead of only source files?

When having "case sensitive" unchecked and looking at all files, the search results are cut off even earlier, namely at line 1254 of tests/testthat/setup.R. There are no results from tests/testthat/test_varsel.R (or vignettes/projpred.Rmd, if modified as described above) at all anymore. I guess the reason could be that there are more results at the beginning of the results list, so that there is some kind of hidden maximum for the number of results shown.

Interestingly, when having "case sensitive" checked and looking at all files, then the results list seems to be complete, in particular, also including line 109 from tests/testthat/test_varsel.R and later occurrences (i.e., also line 1 from vignettes/projpred.Rmd, if modified as described above). Perhaps this helps?

ronblum commented 2 years ago

@fweber144 Thank you for looking further into this!

@jgutman I haven't been able to reproduce the issue (so far). Do you have any suggestions of what to look at or what might be going on?

jgutman commented 2 years ago

@fweber144 the project you are searching is a git repository correct? then the behavior of find in files on windows may depend on the version of git you have installed on your windows platform. Could you please report what version of git you have running on the Windows machine where this fails?

Do you have the [ ] Exclude files matched by .gitignore option set? (Is your project managed by git?)
What version of Git for Windows do you have installed? If git is on the PATH, then system("git --version") should be the easiest way to check.
If you run the code Sys.setenv(RSTUDIO_GREP_DEBUG = 1) in the console, and then perform the search, what debug output do you see in the console?

For others who have run into issues on Windows when using Find in Files in a git repository, we've noticed they have a very old version of Git for Windows installed on their system, and updating Git often helps

fweber144 commented 2 years ago

Thank you for helping here, @jgutman.

the project you are searching is a git repository correct?

No, not really. It is taken from a GitHub repo (my fork of the stan-dev projpred repo at commit https://github.com/fweber144/projpred/commit/ddaf3e976c871e083d22d0381f82d3d2414abeef), but the ZIP file I linked above doesn't include a .git folder (not even a Windows-hidden one). It also doesn't have a .Rproj.user folder. The .github folder only contains FUNDING.yml. In RStudio's global options, I have version-control systems deactivated.

Could you please report what version of git you have running on the Windows machine where this fails?

My Git version is 2.37.1.windows.1.

Do you have the [ ] Exclude files matched by .gitignore option set? (Is your project managed by git?)

Sorry, I'm not sure if I understand you correctly here. You mean the "Exclude these files" checkbox in the "Find in Files" dialog? I have not checked it. When checking it, no files (or patterns) are listed there. Concerning your question about Git management: Does my explanation above (the origin is a Git repo, but the ZIP file should not be affected by that) answer it?

What version of Git for Windows do you have installed? If git is on the PATH, then system("git --version") should be the easiest way to check.

See above.

If you run the code Sys.setenv(RSTUDIO_GREP_DEBUG = 1) in the console, and then perform the search, what debug output do you see in the console?

When following the steps from https://github.com/rstudio/rstudio/issues/11736#issue-1334717833, I get:

> Sys.setenv(RSTUDIO_GREP_DEBUG = 1)
"C:/Program Files/RStudio/bin/gnugrep/3.0/grep" "--binary-files=without-match" "-rHn" "--color=always" "-F" "-f" "C:/Users/<user_name>/AppData/Local/Temp/RtmpsxOA8f/rs_grep24d439171ee6.txt" "--include=*.r" "--include=*.R" "--include=*.rnw" "--include=*.Rnw" "--include=*.rmd" "--include=*.Rmd" "--include=*.rmarkdown" "--include=*.Rmarkdown" "--include=*.qmd" "--include=*.Qmd" "--include=*.md" "--include=*.rhtml" "--include=*.Rhtml" "--include=*.h" "--include=*.hpp" "--include=*.c" "--include=*.cpp" "--include=*.js" "--include=*.yml" "--include=*.yaml"
stdout: NEWS.md:17:* Argument `d_test` of `varsel()` is not considered as an internal feature anymore. This was possible after fixing a bug for `d_test` (see below). (GitHub: #341)
NEWS.md:18:* The order of the observations in the subelements of `<vsel_object>$summaries` and `<vsel_object>$d_test` now corresponds to the order of the observations in the original dataset if `<vsel_object>` was created by a call to `cv_varsel([...], cv_method = "kfold")` (formerly, in that case, the observations in those subelements were ordered by fold). Thereby, the order of the observations in those subelements now always corresponds to the order of the observations in the original dataset, except if `<vsel_object>` was created by a call to `varsel([...], d_test = <non-NULL_d_test_object>)`, in which case the order of the observations in those subelements corresponds to the order of the observations in `<non-NULL_d_test_object>`. (GitHub: #341)
NEWS.md:30:* Fix argument `d_test` of `varsel()`: Not only the predictive performance of the *reference model* needs to be evaluated on the test data, but also the predictive performance of the *submodels*. (GitHub: #341)
R/cv_varsel.R:243:              d_test = sel_cv$d_test,
R/cv_varsel.R:527:  d_test <- list(type = "LOO", data = NULL, offset = refmodel$offset,
R/cv_varsel.R:530:  out_list <- nlist(solution_terms_cv = solution_terms_mat, summaries, d_test)
R/cv_varsel.R:549:    d_test <- list(
R/cv_varsel.R:555:    return(nlist(refmodel = fold$refmodel, d_test))
R/cv_varsel.R:615:                       test_points = fold$d_test$omitted)
R/cv_varsel.R:625:    fold$d_test$omitted
R/cv_varsel.R:642:      newdata = refmodel$fetch_data(obs = fold$d_test$omitted)
R/cv_varsel.R:643:    ) + fold$d_test$offset
R/cv_varsel.R:646:      y_test = fold$d_test, family = fold$refmodel$family,
R/cv_varsel.R:657:    list(offset = fold$d_test$offset,
R/cv_varsel.R:658:         weights = fold$d_test$weights,
R/cv_varsel.R:659:         y = fold$d_test$y)
R/cv_varsel.R:667:               d_test = c(list(type = "kfold", data = NULL), d_cv)))
R/methods.R:410:    nobs_test <- nrow(object$d_test$data %||% object$refmodel$fetch_data())
R/methods.R:566:    nobs_test = nrow(object$d_test$data),
R/misc.R:7:nms_d_test <- function() {
R/summary_funs.R:56:      !all(varsel$d_test$weights == 1)) {
R/summary_funs.R:57:    varsel$d_test$y_prop <- varsel$d_test$y / varsel$d_test$weights
R/summary_funs.R:83:    res <- get_stat(summ$mu, summ$lppd, varsel$d_test, stat, mu.bs = mu.bs,
R/summary_funs.R:86:      data = varsel$d_test$type, size = Inf, delta = delta, statistic = stat,
R/summary_funs.R:100:        res_ref <- get_stat(summ_ref$mu, summ_ref$lppd, varsel$d_test,
R/summary_funs.R:103:        res_diff <- get_stat(summ$mu, summ$lppd, varsel$d_test, stat,
R/summary_funs.R:111:          data = varsel$d_test$type, size = k - 1, delta = delta,
R/summary_funs.R:117:        res <- get_stat(summ$mu, summ$lppd, varsel$d_test, stat, mu.bs = mu.bs,
R/summary_funs.R:119:        diff <- get_stat(summ$mu, summ$lppd, varsel$d_test, stat,
R/summary_funs.R:123:          data = varsel$d_test$type, size = k - 1, delta = delta,
R/summary_funs.R:145:## `d_test$weights`. These are already taken into account by
R/summary_funs.R:149:get_stat <- function(mu, lppd, d_test, stat, mu.bs = NULL, lppd.bs = NULL,
R/summary_funs.R:180:    if (is.null(d_test$y_prop)) {
R/summary_funs.R:181:      y <- d_test$y
R/summary_funs.R:183:      y <- d_test$y_prop
R/summary_funs.R:185:    if (!all(d_test$weights == 1)) {
R/summary_funs.R:186:      wcv <- wcv * d_test$weights
R/summary_funs.R:236:    y <- d_test$y
R/summary_funs.R:237:    if (!is.null(d_test$y_prop)) {
R/summary_funs.R:240:      # `d_test$weights` contains the numbers of trials) with more than 1 trial
R/summary_funs.R:242:      stopifnot(all(.is.wholenumber(d_test$weights)))
R/summary_funs.R:244:      stopifnot(all(0 <= y & y <= d_test$weights))
R/summary_funs.R:246:        c(rep(0L, d_test$weights[i_short] - y[i_short]),
R/summary_funs.R:249:      mu <- rep(mu, d_test$weights)
R/summary_funs.R:251:        mu.bs <- rep(mu.bs, d_test$weights)
R/summary_funs.R:253:      n_notna <- sum(d_test$weights)
R/summary_funs.R:254:      wcv <- rep(wcv, d_test$weights)
R/summary_funs.R:257:      stopifnot(all(d_test$weights == 1))
R/varsel.R:13:#' @param d_test A `list` of the structure outlined in section "Argument
R/varsel.R:14:#'   `d_test`" below, providing test data for evaluating the predictive
R/varsel.R:88:#' # Argument `d_test`
R/varsel.R:90:#' If not `NULL`, then `d_test` needs to be a `list` with the following
R/varsel.R:188:varsel.refmodel <- function(object, d_test = NULL, method = NULL,
R/varsel.R:217:  if (is.null(d_test)) {
R/varsel.R:218:    d_test <- list(type = "train", data = NULL, offset = refmodel$offset,
R/varsel.R:221:    d_test$type <- "test"
R/varsel.R:222:    d_test <- d_test[nms_d_test()]
R/varsel.R:247:                            newdata = d_test$data,
R/varsel.R:248:                            offset = d_test$offset,
R/varsel.R:249:                            wobs = d_test$weights,
R/varsel.R:250:                            y = d_test$y)
R/varsel.R:258:    nobs_test <- nrow(d_test$data %||% refmodel$fetch_data())
R/varsel.R:261:    if (d_test$type == "train") {
R/varsel.R:269:      newdata_for_ref <- d_test$data
R/varsel.R:274:               "`d_test$data`, but that column already exists. Please rename ",
R/varsel.R:275:               "this column in `d_test$data` and try again.")
R/varsel.R:277:        newdata_for_ref$projpred_internal_offs_stanreg <- d_test$offset
R/varsel.R:281:          d_test$offset
R/varsel.R:285:      y_test = d_test, family = refmodel$family, wsample = refmodel$wsample,
R/varsel.R:294:    d_test,
tests/testthat/helpers/testers.R:1079:# @param dtest_expected If `vs` was created with a non-`NULL` argument `d_test`
tests/testthat/helpers/testers.R:1081:#   `vs$d_test` object. Otherwise, this needs to be `NULL`.
tests/testthat/helpers/testers.R:1256:  # d_test
tests/testthat/helpers/testers.R:1258:    expect_type(vs$d_test, "list")
tests/testthat/helpers/testers.R:1259:    expect_named(vs$d_test, nms_d_test(), info = info_str)
tests/testthat/helpers/testers.R:1264:    expect_identical(vs$d_test$type, dtest_type, info = info_str)
tests/testthat/helpers/testers.R:1265:    expect_null(vs$d_test$data, info = info_str)
tests/testthat/helpers/testers.R:1266:    expect_identical(vs$d_test$offset, vs$refmodel$offset, info = info_str)
tests/testthat/helpers/testers.R:1267:    expect_identical(vs$d_test$weights, vs$refmodel$wobs, info = info_str)
tests/testthat/helpers/testers.R:1268:    expect_identical(vs$d_test$y, vs$refmodel$y, info = info_str)
tests/testthat/helpers/testers.R:1270:    expect_identical(vs$d_test, dtest_expected, info = info_str)
tests/testthat/helpers/testers.R:1492:  expect_identical(smmry$nobs_test, nrow(vsel_expected$d_test$data),
tests/testthat/setup.R:1254:  "refmodel", "search_path", "d_test", "summaries", "solution_terms", "kl",
tests/testthat/setup.R:1260:  "refmodel", "search_path", "d_test", "summaries", "kl", "solution_terms",
tests/testthat/test_varsel.R:76:## d_test -----------------------------------------------------------------
tests/testthat/test_varsel.R:79:  "`d_test` set to the training data gives the same results as its default"
tests/testthat/test_varsel.R:102:    d_test_crr <- list(
tests/testthat

jmcphers commented 2 years ago

Possibly also the issue reported here? https://twitter.com/LisaDeBruine/status/1572520018797297664

kevinushey commented 2 years ago

One interesting thing to note: it seems like the output is cut off at the end? E.g.

tests/testthat/test_varsel.R:102:    d_test_crr <- list(
tests/testthat

Perhaps we're losing some output from GNU grep for some reason?

kevinushey commented 2 years ago

I was able to reproduce something similar locally, with output stopping with these lines:

tests/testthat/helpers/testers.R:1492:  expect_identical(smmry$nobs_test, nrow(vsel_expected$d_test$data),
tests/testthat/setup.R:1254:  "refmodel", "search_path", "d_test", "summaries", "solution_terms", "kl",
tests/testthat
stderr: /gnugrep/3.0/grep: .Rproj.user/1901417D/sources/session-ADC58894/lock_file: Device or resource busy

Not sure if the stderr output is a red herring or not.

The frustrating part is that the error seems to go away after restarting RStudio :-/

kevinushey commented 2 years ago

Note for QA: I was able to reproduce following the instructions in the OP (https://github.com/rstudio/rstudio/issues/11736#issue-1334717833); however, at least in my case, the issue seems to reproduce on the first time the project is opened; if you close RStudio and re-open the project, the issue might go away.

For that reason, when reproducing, I recommend testing with a "fresh" copy of the folder unpacked from https://github.com/fweber144/projpred/archive/ddaf3e976c871e083d22d0381f82d3d2414abeef.zip.

Also, to the best of my knowledge, this issue should predominantly affect Windows, but in theory other platforms will be affected as well (and I'm planning a separate PR for that).

jonvanausdeln commented 2 years ago

Verified on 2022.11.0-daily+215 Windows 11

Tested with OP example, works as expected. Used a fresh copy of the .zip file contents.

fweber144 commented 2 years ago

Indeed, I can confirm that the issue does not occur anymore with RStudio 2022.11.0-daily+215 (tested on the Windows 11 machine mentioned above). Thanks a lot to all of you!

kevinushey commented 2 years ago

I'm putting this back into testing as I merged a separate PR for POSIX that does the same thing (ensure we read all stdout / stderr on process exit).

Given that this affects how we read output from any child process we launch, I think this code should be well-exercised by our existing test suite (e.g. anything that uses Quarto would run through this code) so I think our existing automation in that space would suffice for testing.

@jonvanausdeln, do you have any feelings on whether additional testing is warranted for this PR?

jonvanausdeln commented 2 years ago

Did a quick verify on other all platforms, so I think it's good to go now.

rstudio / rstudio