olivroy commented 1 month ago

Please comment on this issue if you tried the *_outline() functions and would like some functionality or encountered something unexpected

cf https://github.com/rstudio/rstudioapi/issues/153#issuecomment-2116139239

Current plans

[ ] Potentially use lightparser for qmd files #29
[ ] Use roxygen2 file parsing (once I figure out how to use roxygen2::parse_package() + md (#29)
[ ] Make it easier to customize what output you want to see. Would be very useful to set as options in Rprofile since these functions should be used interactively.

Known issues

~Falsely recognized commented code as headings in commented code.~ solved by roxygen2.
~multiline titles or fig-caps in qmd files, not recognized~ solved by lightparser
~Only yaml style comments recognized (not in the chunk header)~ solved by lighparser

violetcereza commented 1 month ago

Thanks again for building this! I have been using this for my project, but I wanted to intersperse different levels of headings with my TODO items. I had a lot of trouble figuring out context for my to-do items in the middle of a large notebook.

To accomplish this, I wrote a printing function that takes proj_outline() output and translates the tibble into something we can display with cli::tree(). Here's an example tree with to-do items interspersed:

test.qmd
└─Test
  ├─Quarto
  ├─Running Code
  │ ├─TODO: fix this in the code
  │ └─A sub header
  │   └─TODO: here's a todo in the text
  └─Back to header 1
    ├─Dont skip me
    │ └─header 5
    │   └─TODO: testing section
    └─Another sub header
      └─TODO: section test

I am still playing with the formatting. I started using cli::format_inline() to incorporate the contextual links you designed, but it doesn't show up here.

Below is the code I'm using. First, I simplify the outline tibble to something that makes sense to me. Then, I make the indent variable which is based off of n_leading_hash. Then, I transform a tibble with "indents" into a tree structure for cli::tree(). I can rewrite this as a pull request eventually, but I want to take a shot at using lightparser and spend more time customizing the appearance with colors and stuff before I integrate into your codebase.

  outline_data <- reuseme::file_outline(path = "test.qmd") %>%

  # Convert the many is_ columns into mutually exclusive "outline row types"
  pivot_longer(
    names_to = "type", names_prefix = "is_", c(
      starts_with("is_"), -is_md, -is_second_level_heading_or_more, -is_roxygen_comment, -is_notebook
    )
  ) %>%
  # # Double check that types are mututally exclusive
  # filter(all(value == F), .by = c(file_short, title, line_id))
  # filter(sum(value) > 1, .by = c(file_short, title, line_id))
  filter(value == T) %>%
  # We drop these because they don't serve to add much context to TODOs (they don't affect heirarchy)
  filter(type != "tab_or_plot_title") %>%

  # Some useful definitions!
  mutate(
    title = coalesce(outline_el, title_el),
    file_short = fs::path_file(file),
    n_leading_hash = type %>% case_match(
      c("todo_fixme", "tab_or_plot_title") ~ NA,
      .default = n_leading_hash
    )
  ) %>%

  # For each file, stick a item at the top of the outline
  group_by(file, file_short) %>%
  group_modify(\(data, group) data %>% add_row(
    .before = 0,
    n_leading_hash = -1,
    title = group$file_short,
    type = "file"
  )) %>%

  mutate(
    # Processing how title displays based on type
    print_title = type %>% case_match(
      "todo_fixme" ~ link,
      .default = link_rs_api
    ) %>% coalesce(title) %>% map_chr(cli::format_inline),

    # Assign TODO items (and other items missing n_leading_hash)
    # to be indented under the last seen header level
    indent = coalesce(n_leading_hash, zoo::na.locf0(n_leading_hash+1)),

    # If there are any headers that skip an intermediate level, pick them up
    skip_level = indent > lag(indent)+1,
    skip_level_should_be = ifelse(skip_level, lag(indent)+1, NA),
    skip_level_adjust = case_when(
      # All the items below on the outline should be adjusted backwards
      skip_level ~ skip_level_should_be-indent,
      # Unless we reach a point on the outline where we're back up in
      # the hierachy, so stop adjusting.
      indent <= zoo::na.locf0(skip_level_should_be) ~ 0
    ) %>%
      # Carry the adjustments to later rows
      zoo::na.locf0() %>% coalesce(0),

    indent = indent + skip_level_adjust
  ) %>%
  ungroup() %>%
  select(title, type, n_leading_hash, indent, print_title)

outline_data %>%
  mutate(
    # Give items IDs so titles do not have to be unique
    item_id = as.character(row_number()),
    indent_wider = indent,
    x = TRUE
  ) %>%

  # We need these wide cumsum `header1` type fields to determine which items belong to which parents
  pivot_wider(names_from = indent_wider, values_from = x, values_fill = F, names_prefix = "header") %>%
  mutate(across(starts_with("header"), cumsum)) %>%

  # For each row, pick the IDs of all direct children from the outline
  pmap(function(...) with(list(..., childdata = .), tibble(
    title,
    print_title,
    indent,
    item_id,
    type,
    parent_level_id = get(stringr::str_c("header", indent)),
    children_ids = childdata %>%
      dplyr::rename(childindent = indent) %>%
      dplyr::filter(
        childindent == indent+1,
        cumsum(childindent == indent) == parent_level_id
      ) %>%
      dplyr::pull(item_id) %>% list()
  ))) %>%
  list_rbind() %>%
  # View()
  select(item_id, children_ids, print_title) %>%
  cli::tree()

And here's the qmd text I used:

---
title: "Test"
format: html
editor: visual
---

## Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.

## Running Code

When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

```{r}
1 + 1
# TODO: fix this in the code

You can add options to executable code like this

A sub header

#| echo: false
2 * 2

The echo: false option disables the printing of code (only output is displayed).

TODO: here's a todo in the text

Back to header 1

Dont skip me

header 5

TODO: testing section

Another sub header

TODO: section test

olivroy commented 1 month ago

thank you so much for your insightful comments! I am trying to wrap up parsing roxygen2 comments properly. I will get to this soon!

I really like that you created an example code! I will try to see what I can come up with!

For indentation, it is planned. As you can see, there are already different colors for first level headings vs the rest.

I divided topics between "important" and "not_important", but I can add more categories, colors and more levels of indentation!

violetcereza commented 4 weeks ago

Current iteration of my version! Fork and pull request incoming this week...

Features I plan to look at:

Collapse tree items with one child into Parent -> Child combined nodes for brevity
Somehow adapt to HTML output, so I can embed the TODO list into a notebook (or copy/paste into word processor) to show my boss
If you wanted to use cli_h1() type markup, I could adapt it to use special styles for the top-level nodes in the tree
Use lightparser
Rework how outlines are stored under the hood a bit

After playing with the complete_todo() links a bit, I really like the idea but I don't find it useful enough to keep for me. I update todo annotations relatively rarely, so it isn't a hassle to click the pound sign (to jump to the line) and manually delete the item. Additionally, sometimes TODO items have code on the left side and removing the line breaks the code. I'm not sure supporting this feature is worth all the regex work and potential risk of damaging my scripts.

olivroy commented 4 weeks ago

Good to hear!

For complete_todo(), I understand, I did fix that up though in https://github.com/olivroy/reuseme/commit/14ec8e7146bdb05454f5193507e76ad98d563e89.

Let me know if it still breaks for you.

Also instead of showing Done v?, could also just be green done clickable emoji?

I just disabled it generally in 92c7a66d0a835bea9246deae036931fae49394a1 (only keeping it for active file and files named TODO.R

Apologies in advance for potentially breaking your code

I think I will disable roxygen comments parsing

I really like how you style differently the text!

complete_todo() links should probably just appear if you are requesting the active file only. You are right that you likely won't need to complete todos when requesting outline for another project.

olivroy commented 4 weeks ago

@violetcereza I am looking forward to your PR! Feel free to open smaller chunks PRs first to get changes you'd like to see first. https://code-review.tidyverse.org/author/focused.html

Add as much snapshot tests as you can, so it is easy to visualize output. Therefore, if some things end up changing, we will have a visual reference. https://testthat.r-lib.org/articles/snapshotting.html

Let me know if you have any questions

olivroy commented 3 weeks ago

Features I plan to look at:

Collapse tree items with one child into Parent -> Child combined nodes for brevity

👍 looking forward to seeing examples of this!

Somehow adapt to HTML output, so I can embed the TODO list into a notebook (or copy/paste into word processor) to show my boss

That is a good idea!

If you wanted to use cli_h1() type markup, I could adapt it to use special styles for the top-level nodes in the tree

Not strong opinions there. Recently, I fixed titles h3 to escape markup https://github.com/olivroy/reuseme/commit/b4803a244a484a7b686d05f28ea7be91f4d412c3#diff-708c3b67fc1b91c821cc6c00c9c8108cabcdd44c7edd9366cf62a373b717ca0cR440 in outline.R

Use lightparser

If possible, file this a separate PR?

Rework how outlines are stored under the hood a bit

I will be happy to see this too! Separate PR if possible too :)

I am looking forward. Here, I am seeing 3 steps.

Add lightparser support for parsing md/ Rmd / qmd files.
Tweak how outline is stored to work better with tree
Add support for tree. (Ideally this would affect the print method) I can let you show me, but my initial thoughts are the following. (I really like how the tree looks, and will probably end up deprecating the current way of doing, but for now I'd like to keep both functional unless there are no drawbacks from using trees)
Add an option that tells that you want tree output options("reuseme.outline_tree") for example (that would strip complete_todo().

At the end of file_outline(), the following code.


if (isTRUE(getOption("reuseme.outline_tree") {
  # modify result to be tree in `outline_tree()` helper.
  result <- outline_tree(result)
  class(result) <- c("cli_tree", "outline_report", "tbl_df", "tbl", "data.frame")
}

This way, this could just use the `cli:::print.cli_tree() method?

Alternatively (maybe better) refactor the print.outline_report() to print differently whether the tree option is TRUE.

Let me know what you think!

If tree output

Any (non-trivial) code related to trees should live in R/outline-tree.R :)

Don't let this remove your creativity, these are just the thoughts I have assembled since yesterday.

Apologies also if some of my recent changes broke your code. (I am still experiencing). Hopefully it is not too bad to repair.

I have removed some variables from output:

I have renamed some functions and columns (line_id to line), added some features.

https://github.com/olivroy/reuseme/blob/127fe6c01eafd03453bd7173a925b346af83f5ae/R/outline.R#L715-L730

Feel free to share any code. I can rework it to account for my recent changes. And make adjustments as needed to revert some changes.

Any other comment / feedback welcome! You inspired me to make changes I didn't think of at first (avoid printing complete_todo() all the time is a big one!)

violetcereza commented 3 weeks ago

Thanks! I will be taking a look at it today...

Here's what I'm thinking big picture for the outline interface:

Users want to see the high level structure of a project/dir/file integrated with various code annotations
- Outlines are intrinsically hierarchical, but users vary about how they split things across files / use annotations
- The RStudio outline panel makes navigation easy, and should be a guide for our outline
Different types of files have different structure, with different ideas of what makes an "outline item" and what affects the hierarchy
- Some people use the WORK tag while I don't. Other people might have custom annotations that come with activated cli links
- The outline structure should incorporate directories. If a directory happens to be the root of a project (.Rproj present), the outline item gets a special type = "project" and special custom annotation that activates reuseme::proj_switch
- In notebooks, chunk names are important for some people. Other elements include tabs, plot titles, subtitles
- In R scripts, function names and code sections can be nested
- Parsing for NEWS.md?
Users want something highly customizable for their project/habits, and want to be able to inspect the outline internals for various reasons
- I think the reuseme::outline API is getting complicated, and a lot of the tasks do two things: filter the items and change how they display
- I propose that filtering is done by users with dplyr and the resulting table is passed to the print method.
- Simplify the data structure to make filtering easy. All outlines will be a tibble with class outline_report and the following columns:
- file, title, type, line, indent
- I can't think of any drawbacks to using trees (maybe lack of multiline support?) but adding a theme API could provide total back compatibility.
- I think it's easier to work with indent rather than the parent/children IDs required by cli_tree(). While both hierarchical structures have their strengths, it's easier to read scan down indent in table format. However, rearranging items will break the structure of indents. I propose implementing sorting in the sort expression to print.outline_report to make this easy and flexible.

The new simplified API

`proj_outline(proj = NULL)`

If no proj, proxy for dir_outline(rstudioapi::getActiveProject()) otherwise dir_outline(proj_list(proj)).

`dir_outline(recurse = T)`

Run file_outline() on everything and aggregate into tibble under top level folder item.

`file_outline(path = rstudioapi::documentPath())`

Run file_outline() on everything and aggregate into tibble under top level folder item. Uses my algorithm to turn arbitrary indentation into strict hierarchical levels and make indent variable more meaningful.

`print.outline_report(outline, sort = F, collapse_nodes = T, prune_nodes = T, theme = NULL)`

Prints a multicolored tree. Uses my algorithm to turn outline indentation into parent/child structure. Optionally collapse nodes with one child into Parent -> Child nodes for brevity. Optionally prune nodes with no sub-annotations.

Allows sort with a data mask that operates recursively on sub-tibbles of sibling items (with a nod to data.tree sorting).

theme is a function(outline, indent) that allows customization for output (TBD).

Deprecations

Please let me know if my proposed API simplifications destroy any functionality you were using.

proj_outline(pattern = "x")
# NOW
proj_outline() |>
  filter(file |> str_detect("x"))

proj_outline(print_todo = F)
# NOW
proj_outline() |>
  filter(type != "todo")

proj_outline(work_only = F)
# NOW
proj_outline() |>
  print(prune_nodes = F)

proj_outline(alpha = T)
# NOW
proj_outline() |>
  print(sort = title)

proj_outline(recent_only = T)
# NOW
# Only select 5 most recently modified files?
proj_outline() |>
  mutate(modified = file %>% map_dbl(file.mtime)) |>
  filter(modified >= modified |> unique() |> sort() |> nth(-5))

# Extra tricks: print only files that have TODO items, and only up to level 3 headings
# (include todos under omitted headings)
# (print headings with no todo items as well)
proj_outline() |>
  filter(.by = file, any(type == "todo"), indent <= 3 | type != "heading") |>
  print(prune_nodes = F)

data <- proj_outline()
data$file_path # redundant with file?
data$file_short # use fs::path_file()
data$file_ext # use fs::path_ext()

# These should be cleaned and aggregated into `title` with some choices made by `file_outline()`
c(data$content, 
  data$outline_el,
  data$title_el, etc...)

# These should be internal to `print.outline_report`
c(data$link, 
  data$text_in_link,
  data$style_fun,
  data$link_rs_api,
  data$file_hl
)

olivroy commented 3 weeks ago

I like this! Do you want to open a PR, or do you want me to make these changes first?

violetcereza commented 1 week ago

Hey I know I am procrastinating writing unit tests on my PR...

But WRT to lightparser: I started trying out VSCode this week and I learned about "Language Server Protocols." Basically, there's too many types of files and too many IDEs and LSP provides a common interface and prevents the headache of re-implementing code analysis. I wonder if we could plug into the foldingRangeProvider service from languageserver to generate outlines.

olivroy commented 1 week ago

I understand. On my side, I have been able to successfully move link creation inside the print method. I am thinking about what you are proposing.

Feel free to continue sharing what you find! The end goal is to improve our workflows, not to stick to a suboptimal solution

olivroy / reuseme

`_outline()` functions feedback #28