Pre-filtering through pipe connections does not work on remote files

Yes, you can do this by using curl to stream the data then (unix) pipe that into grep, awk, or (see below) perl and put all of that into (R) pipe().
Here I'm using the techniques shown by @jimhester in this YouTube video: https://youtu.be/RYhwZW6ofbI?si=4uRdHcgsSBckd8n4&t=825
Note the last example, which gives a nicer result than the first two, because it also gets the column names.
library(vroom)
library(glue)

cats_file <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-31/cats_uk.csv"

cmd <- glue("curl -sL {cats_file} | grep '^Ares,'")
vroom(pipe(cmd), col_names = FALSE)
#> Rows: 103 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (2): X1, X11
#> dbl  (5): X2, X5, X6, X7, X8
#> lgl  (3): X3, X9, X10
#> dttm (1): X4
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 103 × 11
#>    X1           X2 X3    X4                     X5    X6    X7    X8 X9    X10  
#>    <chr>     <dbl> <lgl> <dttm>              <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
#>  1 Ares     3.40e9 TRUE  2017-06-24 01:03:57 -5.11  50.2   684 155.  FALSE FALSE
#>  2 Ares     3.40e9 TRUE  2017-06-24 01:11:20 -5.11  50.2   936 155.  FALSE FALSE
#>  3 Ares     3.40e9 TRUE  2017-06-24 02:58:16 -5.11  50.2  2340  81.4 FALSE FALSE
#>  4 Ares     3.40e9 TRUE  2017-06-24 03:01:26 -5.11  50.2     0  67.8 FALSE FALSE
#>  5 Ares     3.40e9 TRUE  2017-06-24 03:51:58 -5.11  50.2  4896 118.  FALSE FALSE
#>  6 Ares     3.40e9 TRUE  2017-06-24 03:55:03 -5.11  50.2   504 123.  FALSE FALSE
#>  7 Ares     3.40e9 TRUE  2017-06-24 05:24:17 -5.11  50.2   108  27.1 FALSE FALSE
#>  8 Ares     3.40e9 TRUE  2017-06-24 05:27:23 -5.11  50.2   504  36.3 FALSE FALSE
#>  9 Ares     3.40e9 TRUE  2017-06-24 09:14:37 -5.11  50.2   252  67.4 FALSE FALSE
#> 10 Ares     3.40e9 TRUE  2017-06-24 09:17:42 -5.11  50.2  4068  78.9 FALSE FALSE
#> # ℹ 93 more rows
#> # ℹ 1 more variable: X11 <chr>

cmd <- glue("curl -sL {cats_file} | awk -F ',' '$1 ~ /Ares/'")
vroom(pipe(cmd), col_names = FALSE)
#> Rows: 103 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (2): X1, X11
#> dbl  (5): X2, X5, X6, X7, X8
#> lgl  (3): X3, X9, X10
#> dttm (1): X4
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 103 × 11
#>    X1           X2 X3    X4                     X5    X6    X7    X8 X9    X10  
#>    <chr>     <dbl> <lgl> <dttm>              <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
#>  1 Ares     3.40e9 TRUE  2017-06-24 01:03:57 -5.11  50.2   684 155.  FALSE FALSE
#>  2 Ares     3.40e9 TRUE  2017-06-24 01:11:20 -5.11  50.2   936 155.  FALSE FALSE
#>  3 Ares     3.40e9 TRUE  2017-06-24 02:58:16 -5.11  50.2  2340  81.4 FALSE FALSE
#>  4 Ares     3.40e9 TRUE  2017-06-24 03:01:26 -5.11  50.2     0  67.8 FALSE FALSE
#>  5 Ares     3.40e9 TRUE  2017-06-24 03:51:58 -5.11  50.2  4896 118.  FALSE FALSE
#>  6 Ares     3.40e9 TRUE  2017-06-24 03:55:03 -5.11  50.2   504 123.  FALSE FALSE
#>  7 Ares     3.40e9 TRUE  2017-06-24 05:24:17 -5.11  50.2   108  27.1 FALSE FALSE
#>  8 Ares     3.40e9 TRUE  2017-06-24 05:27:23 -5.11  50.2   504  36.3 FALSE FALSE
#>  9 Ares     3.40e9 TRUE  2017-06-24 09:14:37 -5.11  50.2   252  67.4 FALSE FALSE
#> 10 Ares     3.40e9 TRUE  2017-06-24 09:17:42 -5.11  50.2  4068  78.9 FALSE FALSE
#> # ℹ 93 more rows
#> # ℹ 1 more variable: X11 <chr>

cmd <- glue("curl -sL {cats_file} | perl -ne 'use English; print if $INPUT_LINE_NUMBER == 1 || /^Ares,/'")
vroom(pipe(cmd))
#> Rows: 103 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (2): tag_id, study_name
#> dbl  (5): event_id, location_long, location_lat, ground_speed, height_above_...
#> lgl  (3): visible, algorithm_marked_outlier, manually_marked_outlier
#> dttm (1): timestamp
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 103 × 11
#>    tag_id   event_id visible timestamp           location_long location_lat
#>    <chr>       <dbl> <lgl>   <dttm>                      <dbl>        <dbl>
#>  1 Ares   3395610551 TRUE    2017-06-24 01:03:57         -5.11         50.2
#>  2 Ares   3395610552 TRUE    2017-06-24 01:11:20         -5.11         50.2
#>  3 Ares   3395610553 TRUE    2017-06-24 02:58:16         -5.11         50.2
#>  4 Ares   3395610554 TRUE    2017-06-24 03:01:26         -5.11         50.2
#>  5 Ares   3395610555 TRUE    2017-06-24 03:51:58         -5.11         50.2
#>  6 Ares   3395610556 TRUE    2017-06-24 03:55:03         -5.11         50.2
#>  7 Ares   3395610557 TRUE    2017-06-24 05:24:17         -5.11         50.2
#>  8 Ares   3395610558 TRUE    2017-06-24 05:27:23         -5.11         50.2
#>  9 Ares   3395610559 TRUE    2017-06-24 09:14:37         -5.11         50.2
#> 10 Ares   3395610560 TRUE    2017-06-24 09:17:42         -5.11         50.2
#> # ℹ 93 more rows
#> # ℹ 5 more variables: ground_speed <dbl>, height_above_ellipsoid <dbl>,
#> #   algorithm_marked_outlier <lgl>, manually_marked_outlier <lgl>,
#> #   study_name <chr>
^{Created on 2023-09-28 with reprex v2.0.2.9000}
tidyverse / vroom

Pre-filtering through pipe connections does not work on remote files #508