sopa301 commented 7 months ago

What feature(s) would you like to see in RepoSense

Is the feature request related to a problem?

Currently, the format for configuring the one-stop config file repo-config.csv is difficult for inexperienced users to use. Additionally, it is not friendly for those trying to create a code portfolio.

If possible, describe the solution

Repurpose report-config.json into a one-stop config file in a file format that is more user-friendly. Possible formats include yml and xml. Additionally, include more configuration options (to be decided).

This means that the order of priority would be this new file, then repo-config, then the rest of the csv files.

Steps:

Port the current functionality into yml and update the parser.
Include more functionalities as necessary.
Update documentation.

Additional context

Expanded version of item 2 of #2157

damithc commented 7 months ago

I don't think we should alter the repo-config.csv file. It is optimized for situations where there is a large number of repos. What we need is a one-stop config file that can hold some basic config options of diverse nature, to cater for simpler cases. The specialized config files such as repo-config.csv need to remain for cases that need deeper configurations or to configure heavier datasets.

sopa301 commented 7 months ago

Got it prof, I've updated the solution.

damithc commented 7 months ago

@sopa301 a more suitable config file to repropose for this may be the report-config.json https://reposense.org/ug/configFiles.html#report-config-json

damithc commented 7 months ago

Here's an example that makes sense from the user point of view, without being tied to JSON, YAML etc.:

title: John Doe's Code Dashboard
start: 2020-01-01
end: 2023-01-01
------
# John Doe's Code Dashboard

These are some of the work I've done in the past few years. Mostly for course projects, but some are pet projects.

====================================
repo:https://github.com/GERARDJM018/ip
branch: master
authorNames: johnDoe;John Doe;my home PC
------
# DukePro

This is a solo project I did for a course, using **Java**.

<img src="https://gerardjm018.github.io/ip/Ui.png"/>

[Product Home Page](https://johnDoe.github.io/ip/)

====================================
repo: https://github.com/AY2324S2-CS2103T-W08-1/tp
branch: master
authorNames: johnDoe;John Doe;my home PC
------
# Contacts Fun

This is a team project I did for a course, using **Java, JavaFX**.

<img src="https://gerardjm018.github.io/ip/Ui.png"/>

[Product Home Page](https://johnDoe.github.io/ip/)
Notable PRs: [Adding auto-scroll feature](https://github.com/AY2324S2-CS2103T-W08-1/tp/pulls/34)

damithc commented 7 months ago

Another thing: There is no strict need to force the entire file to be in one format. For example, if you look at the example above, you can split it first by ======... and then by ------ and each chunk you get after splitting is either fully YAML or fully Markdown which can then go into the matching parser. That is, the file can be a combination of different formats. While combining formats is not ideal, it may still be better than trying use one format for the whole file.

gok99 commented 6 months ago

2192 Introduces blurb.md for specifying blurbs for repositories. It seems reasonable to me to include blurb specification to the one-stop config since users are likely to use the config for portfolios. This however would cause complications with different sources of truth for blurbs. There are number of ways we could deal with this:

Don't allow blurbs to be specified in the one-stop config
Move specification of blurbs to the one-stop config only
Have one config be of a higher priority for specification in case there are conflicts
Explicitly override the .md config like we do in the CSVs
Something else

Each have different pros/cons which perhaps we could discuss.

ckcherry23 commented 6 months ago

I think option 3 would work best, considering the rest of the information in the one-stop config file can also be specified in author-config.csv, group-config.csv and repo-config.csv, leading to such conflicts.

gok99 commented 6 months ago

This is true, we have the standalone config, CSVs and now the one-stop config all of which could conflict :persevere:. I think it might make sense even to find a way to deprecate the standalone config in place of the one-stop config. The interaction/priorities between the one-stop config and the CSVs could be the same as the interaction of the standalone config with the CSVs.

ckcherry23 commented 6 months ago

Yes, #2171 is related to this.

gok99 commented 6 months ago

@ckcherry23 Thanks for linking that issue! It makes sense to me to do that together with this major new config, which has a lot of overlap in functionality

damithc commented 5 months ago

Yup, we can expedite removing the standalone config file. I'm of two minds as to doing it in the same PR though. Reasons:

Mainly, to keep the size of this change manageable. Removing the standalone config file may not be trivial (e.g., it is also mentioned in other config files).
It is a breaking change. So, it will require the next release to jump to v4.0. Too soon? This feature itself is not a breaking change.
Technically, it is a different feature from this one. So, the two can be considered as two different changes.

gok99 commented 5 months ago

This feature itself is not a breaking change.

I personally feel introducing a third conflicting config, and ensuring that the logic for handling them (over handling precedence rules for only two) is done in a consistent way will be quite challenging and might cause some trouble when the logic is later removed when we remove the standalone config. Doing a direct swap will let us keep the existing precedence rules and logic between the standalone config and CSV.

That being said, I do agree that removing the standalone config and introducing a fully featured new config in the same PR is a bit too large in scope. Perhaps we can break them down into 2 steps:

A first PR to replace the standalone config with yaml config of exactly the same features.
Second PR to add the additional features not captured in the standalone config, and resolve additional conflicts between the CSVs and yaml config

Step 1 will be breaking change however, and will require a major version bump. If this is undesirable, we could introduce the new config with the highest precedence and ignore all other configs if it's used (even if overriding), to reduce the implementation complexity. Given that the one-stop config is pretty fully featured, this behavior might be ok.

damithc commented 5 months ago

I personally feel introducing a third conflicting config, and ensuring that the logic for handling them (over handling precedence rules for only two) is done in a consistent way will be quite challenging and might cause some trouble when the logic is later removed when we remove the standalone config. Doing a direct swap will let us keep the existing precedence rules and logic between the standalone config and CSV.

That being said, I do agree that removing the standalone config and introducing a fully featured new config in the same PR is a bit too large in scope. Perhaps we can break them down into 2 steps:

A first PR to replace the standalone config with yaml config of exactly the same features.

Second PR to add the additional features not captured in the standalone config, and resolve additional conflicts between the CSVs and yaml config

That's workable as well. If we are starting from scratch, this is probably what we should do. But as we've already done a PR for the one-stop config file, the choice is not that clear. So, I'll leave you all to decide if it is easier to add the 'remove standalone config' to the current PR, or switch to the above 2-step approach (or some other strategy).

Step 1 will be breaking change however, and will require a major version bump. If this is undesirable, we could introduce the new config with the highest precedence and ignore all other configs if it's used (even if overriding), to reduce the implementation complexity. Given that the one-stop config is pretty fully featured, this behavior might be ok.

Bumping the major version is fine, if needed. There is no real cost of doing so.

gok99 commented 5 months ago

If I'm not mistaken, while #2192 introduces the new one-stop config and corresponding fields, not all of them are integrated into the analysis. It seems like currently, it replaces just the report-config.json and the report title is now taken from one-stop config instead. It probably would not waste any work to have it do the same with the standalone config in a similar way (@asdfghjkxd could confirm if this is a correct assessment).

georgetayqy commented 5 months ago

Hi @gok99, you are right in your assessment that the current implementation of the report-config.yaml file merely replaces the old report-config.json file without any logic behind it (apart from the existing logic to extract the report title).

My intentions with the PR were to determine the correct file format and fields to include in the file; I was hoping that the format could be approved by everyone first before we start the process of writing the logic needed to meaningfully use the fields specified in the file to define configurations that will be used in the report.

However, if you wish to combine both steps of converting from JSON to YAML and deprecating the standalone config files into one PR, I can edit the PR to include both steps instead!

gok99 commented 4 months ago

I think it would make sense then to do them in the same PR given the significant overlap between the configs.

We can start by adding to the current one-stop config fields to subsume the standalone config - as far as I can tell, there are only two things that aren't already included

Repo-level fileSizeLimit: limit that overrides the defafult file size limit, but that takes precedence over the limit in the CSVs unless explicity specifed in the CSV.
Author-level ignoreGlobList: we already have this at the repository level and could add another one at the author level, although this could be needlessly confusing. We can discuss if this feature is useful enough to keep.

damithc commented 4 months ago

Repo-level fileSizeLimit: limit that overrides the defafult file size limit, but that takes precedence over the limit in the CSVs unless explicity specifed in the CSV.

Perhaps we ignore the CSV files entirely if the one-stop file is present? There is no good reason for someone to have both formats, right?

2. Author-level ignoreGlobList: we already have this at the repository level and could add another one at the author level, although this could be needlessly confusing. We can discuss if this feature is useful enough to keep.

Yes, this can be removed. repo level ignoreGlobList should be enough.

gok99 commented 4 months ago

Perhaps we ignore the CSV files entirely if the one-stop file is present? There is no good reason for someone to have both formats, right?

Yes, the overriding behavior in CSVs no longer make much sense without the standalone config, and it's probably unlikely that users would need both configs - we can issue a warning if they are both specified. Removing all the overriding logic in #2192 would likely grow out of scope though, so we might just focus on:

Handling logic for co-existing one-stop config and CSVs
Add fileSizeLimit to the one-stop config so that the CSVs won't be needed concurrently to specify it

Then, a second PR to properly remove the standalone config and the overriding columns in the CSVs.

Yes, this can be removed. repo level ignoreGlobList should be enough.

:+1:

On the point on the purpose of the one-stop config

What we need is a one-stop config file that can hold some basic config options of diverse nature, to cater for simpler cases

It seems that we have now a pretty fully featured (almost) no-lose replacement of the CSV configs. This is nice because users can specify as much as they want and incrementally add to the config without needing to switch to the CSV. On the other hand, the specification for the one-stop config is no longer very basic. It could also get hard to explain which config should be used when and why two nearly identical configuration options exist. It might be a good idea at this stage to discuss if we want to keep all this customizability.

damithc commented 4 months ago

It seems that we have now a pretty fully featured (almost) no-lose replacement of the CSV configs. This is nice because users can specify as much as they want and incrementally add to the config without needing to switch to the CSV. On the other hand, the specification for the one-stop config is no longer very basic. It could also get hard to explain which config should be used when and why two nearly identical configuration options exist. It might be a good idea at this stage to discuss if we want to keep all this customizability.

Good point @gok99 We could start with a one-stop-config file that supports just enough attributes for an individual to set up their own code portfolio page using RepoSense (that use case was the motivation for this feature anyway). That is, we start with an MVP version of the feature first. What do you think?

gok99 commented 4 months ago

That sounds good to me. We could discuss what features to include. Here's a suggestion, based on the current one-stop config fields.

title: Title of the generated report, which is also the title of the deployed dashboard. Default: "RepoSense Report".
repos: A list of repositories to include for analysis.
- repo: The URL to your repository of interest.
- groups: A list of the different custom groupings.
  - group-name: Name of the group.
  - globs: The list of file path globs to include for specified group.
- branches: A list of branches to analyse for each repository.
- branch: The name of the branch.
- blurb: Blurb content in markdown format
- ignore-glob-list: Folders/files to ignore, specified using the glob format.
- ignore-authors-list: The list of authors to ignore during analysis. Authors should be specified by their Git Author Name.

Fields removed:

authors: Might be sufficient to specify these with ignore-authors-list
file-formats: Not sure if we might need this
ignore-standalone-config: Will be deprecated soon
ignore-commits-list: Seems like the average user would probably not need this
is-shallow-cloning: Seems like the average user would probably not need this
is-find-previous-authors: Tied to ignore-commits-list and probably not needed by the average user

Added a blurb field at the repo-level. We can enforce that if the one-stop config is present, then blurbs.md and the CSVs are ignored.

georgetayqy commented 4 months ago

@gok99 @damithc I have updated the YAML file format to reflect the changes proposed in the above comment. As for the logic behind the one-stop config file, I will work on it incrementally over the coming month during my free time to get the application to a point such that, if specified, the one-stop config file takes precedence over the other CSV config files.

If I'm not mistaken, I will most likely have to modify the RunConfigurationDecider class and any other related classes to implement the config overriding logic, but do feel free to correct me should I be wrong about this!

damithc commented 4 months ago

Fields removed:

authors: Might be sufficient to specify these with ignore-authors-list

@gok99 I'm not sure about this. In this 'personal code portfolio' use case, the user may have used just a few author names when contributing to the repo while the repo might contain commits from hundreds of other authors. Furthermore, new authors might join the repo continuously. So, feels like we should specify which authors to capture rather than which ones to ignore.

damithc commented 4 months ago

Added a blurb field at the repo-level. We can enforce that if the one-stop config is present, then blurbs.md and the CSVs are ignored.

On this, I'd rather give priority to the blurbs.md. This is because in our use case the user might want to add extensive and formatted prose explaining their contribution to the repo, which might be troublesome to add to a yaml/json file. Alternatively, we can omit blurb from the config file entirely -- which means if the user wants to add descriptions to the report, s/he needs to also add a blurbs.md file.

gok99 commented 4 months ago

So, feels like we should specify which authors to capture rather than which ones to ignore

That makes sense! I think I removed it because authors have a number of different (potentially confusing) configuration options, but it seems reasonable to keep them.

On this, I'd rather give priority to the blurbs.md. This is because in our use case the user might want to add extensive and formatted prose explaining their contribution to the repo, which might be troublesome to add to a yaml/json file. Alternatively, we can omit blurb from the config file entirely -- which means if the user wants to add descriptions to the report, s/he needs to also add a blurbs.md file.

I think it would be nice to have the same level of expressivity in the one-stop config as in the blurbs.md file if we are hoping for the one-stop config to be used for personal portfolios - needing to separate out the blurbs for advanced features would be inconvenient. Parsing (and writing) full markdown embedded in the yaml config would certainly be troublesome though... Maybe we can think of ways to have it not be quite so awkward, or failing that have blurbs.md take priority as you suggest.

damithc commented 4 months ago

I think it would be nice to have the same level of expressivity in the one-stop config as in the blurbs.md file if we are hoping for the one-stop config to be used for personal portfolios - needing to separate out the blurbs for advanced features would be inconvenient. Parsing (and writing) full markdown embedded in the yaml config would certainly be troublesome though... Maybe we can think of ways to have it not be quite so awkward, or failing that have blurbs.md take priority as you suggest.

How about this?

If a blurb for a given repo/branch is present in one of either the config file or the blurbs.md, we take that one.
If present in both, we take the one given in the config file.

That way, the user have the option to use either one but config file takes priority.

gok99 commented 4 months ago

That way, the user have the option to use either one but config file takes priority.

Do you mean blurbs.md takes priority? I think it makes sense to take the blurb from the md file if both are specified, if the md is strictly more expressive.

damithc commented 4 months ago

Do you mean blurbs.md takes priority? I think it makes sense to take the blurb from the md file if both are specified, if the md is strictly more expressive.

@gok99 That's fine too. What I don't want is the presence of the one-stop config file causing the blurbs.md to be ignored entirely. As long as I have the option to use the blurbs.md together with the one-stop config file, I'm OK with either of the two files taking priority over the other.

reposense / RepoSense

One-Stop Config File for Code Portfolio #2161

What feature(s) would you like to see in RepoSense

Is the feature request related to a problem?

If possible, describe the solution

Additional context

:+1: