Open sopa301 opened 7 months ago
I don't think we should alter the repo-config.csv file. It is optimized for situations where there is a large number of repos. What we need is a one-stop config file that can hold some basic config options of diverse nature, to cater for simpler cases. The specialized config files such as repo-config.csv need to remain for cases that need deeper configurations or to configure heavier datasets.
Got it prof, I've updated the solution.
@sopa301 a more suitable config file to repropose for this may be the report-config.json
https://reposense.org/ug/configFiles.html#report-config-json
Here's an example that makes sense from the user point of view, without being tied to JSON, YAML etc.:
title: John Doe's Code Dashboard
start: 2020-01-01
end: 2023-01-01
------
# John Doe's Code Dashboard
These are some of the work I've done in the past few years. Mostly for course projects, but some are pet projects.
====================================
repo:https://github.com/GERARDJM018/ip
branch: master
authorNames: johnDoe;John Doe;my home PC
------
# DukePro
This is a solo project I did for a course, using **Java**.
<img src="https://gerardjm018.github.io/ip/Ui.png"/>
[Product Home Page](https://johnDoe.github.io/ip/)
====================================
repo: https://github.com/AY2324S2-CS2103T-W08-1/tp
branch: master
authorNames: johnDoe;John Doe;my home PC
------
# Contacts Fun
This is a team project I did for a course, using **Java, JavaFX**.
<img src="https://gerardjm018.github.io/ip/Ui.png"/>
[Product Home Page](https://johnDoe.github.io/ip/)
Notable PRs: [Adding auto-scroll feature](https://github.com/AY2324S2-CS2103T-W08-1/tp/pulls/34)
Another thing: There is no strict need to force the entire file to be in one format. For example, if you look at the example above, you can split it first by ======...
and then by ------
and each chunk you get after splitting is either fully YAML or fully Markdown which can then go into the matching parser. That is, the file can be a combination of different formats.
While combining formats is not ideal, it may still be better than trying use one format for the whole file.
Each have different pros/cons which perhaps we could discuss.
I think option 3 would work best, considering the rest of the information in the one-stop config file can also be specified in author-config.csv
, group-config.csv
and repo-config.csv
, leading to such conflicts.
This is true, we have the standalone config, CSVs and now the one-stop config all of which could conflict :persevere:. I think it might make sense even to find a way to deprecate the standalone config in place of the one-stop config. The interaction/priorities between the one-stop config and the CSVs could be the same as the interaction of the standalone config with the CSVs.
Yes, #2171 is related to this.
@ckcherry23 Thanks for linking that issue! It makes sense to me to do that together with this major new config, which has a lot of overlap in functionality
Yup, we can expedite removing the standalone config file. I'm of two minds as to doing it in the same PR though. Reasons:
This feature itself is not a breaking change.
I personally feel introducing a third conflicting config, and ensuring that the logic for handling them (over handling precedence rules for only two) is done in a consistent way will be quite challenging and might cause some trouble when the logic is later removed when we remove the standalone config. Doing a direct swap will let us keep the existing precedence rules and logic between the standalone config and CSV.
That being said, I do agree that removing the standalone config and introducing a fully featured new config in the same PR is a bit too large in scope. Perhaps we can break them down into 2 steps:
Step 1 will be breaking change however, and will require a major version bump. If this is undesirable, we could introduce the new config with the highest precedence and ignore all other configs if it's used (even if overriding), to reduce the implementation complexity. Given that the one-stop config is pretty fully featured, this behavior might be ok.
I personally feel introducing a third conflicting config, and ensuring that the logic for handling them (over handling precedence rules for only two) is done in a consistent way will be quite challenging and might cause some trouble when the logic is later removed when we remove the standalone config. Doing a direct swap will let us keep the existing precedence rules and logic between the standalone config and CSV.
That being said, I do agree that removing the standalone config and introducing a fully featured new config in the same PR is a bit too large in scope. Perhaps we can break them down into 2 steps:
- A first PR to replace the standalone config with yaml config of exactly the same features.
- Second PR to add the additional features not captured in the standalone config, and resolve additional conflicts between the CSVs and yaml config
That's workable as well. If we are starting from scratch, this is probably what we should do. But as we've already done a PR for the one-stop config file, the choice is not that clear. So, I'll leave you all to decide if it is easier to add the 'remove standalone config' to the current PR, or switch to the above 2-step approach (or some other strategy).
Step 1 will be breaking change however, and will require a major version bump. If this is undesirable, we could introduce the new config with the highest precedence and ignore all other configs if it's used (even if overriding), to reduce the implementation complexity. Given that the one-stop config is pretty fully featured, this behavior might be ok.
Bumping the major version is fine, if needed. There is no real cost of doing so.
If I'm not mistaken, while #2192 introduces the new one-stop config and corresponding fields, not all of them are integrated into the analysis. It seems like currently, it replaces just the report-config.json
and the report title is now taken from one-stop config instead. It probably would not waste any work to have it do the same with the standalone config in a similar way (@asdfghjkxd could confirm if this is a correct assessment).
Hi @gok99, you are right in your assessment that the current implementation of the report-config.yaml
file merely replaces the old report-config.json
file without any logic behind it (apart from the existing logic to extract the report title).
My intentions with the PR were to determine the correct file format and fields to include in the file; I was hoping that the format could be approved by everyone first before we start the process of writing the logic needed to meaningfully use the fields specified in the file to define configurations that will be used in the report.
However, if you wish to combine both steps of converting from JSON to YAML and deprecating the standalone config files into one PR, I can edit the PR to include both steps instead!
I think it would make sense then to do them in the same PR given the significant overlap between the configs.
We can start by adding to the current one-stop config fields to subsume the standalone config - as far as I can tell, there are only two things that aren't already included
fileSizeLimit
: limit that overrides the defafult file size limit, but that takes precedence over the limit in the CSVs unless explicity specifed in the CSV.ignoreGlobList
: we already have this at the repository level and could add another one at the author level, although this could be needlessly confusing. We can discuss if this feature is useful enough to keep.
- Repo-level
fileSizeLimit
: limit that overrides the defafult file size limit, but that takes precedence over the limit in the CSVs unless explicity specifed in the CSV.
Perhaps we ignore the CSV files entirely if the one-stop file is present? There is no good reason for someone to have both formats, right?
2. Author-level
ignoreGlobList
: we already have this at the repository level and could add another one at the author level, although this could be needlessly confusing. We can discuss if this feature is useful enough to keep.
Yes, this can be removed. repo level ignoreGlobList should be enough.
Perhaps we ignore the CSV files entirely if the one-stop file is present? There is no good reason for someone to have both formats, right?
Yes, the overriding behavior in CSVs no longer make much sense without the standalone config, and it's probably unlikely that users would need both configs - we can issue a warning if they are both specified. Removing all the overriding logic in #2192 would likely grow out of scope though, so we might just focus on:
fileSizeLimit
to the one-stop config so that the CSVs won't be needed concurrently to specify itThen, a second PR to properly remove the standalone config and the overriding columns in the CSVs.
Yes, this can be removed. repo level ignoreGlobList should be enough.
On the point on the purpose of the one-stop config
What we need is a one-stop config file that can hold some basic config options of diverse nature, to cater for simpler cases
It seems that we have now a pretty fully featured (almost) no-lose replacement of the CSV configs. This is nice because users can specify as much as they want and incrementally add to the config without needing to switch to the CSV. On the other hand, the specification for the one-stop config is no longer very basic. It could also get hard to explain which config should be used when and why two nearly identical configuration options exist. It might be a good idea at this stage to discuss if we want to keep all this customizability.
It seems that we have now a pretty fully featured (almost) no-lose replacement of the CSV configs. This is nice because users can specify as much as they want and incrementally add to the config without needing to switch to the CSV. On the other hand, the specification for the one-stop config is no longer very basic. It could also get hard to explain which config should be used when and why two nearly identical configuration options exist. It might be a good idea at this stage to discuss if we want to keep all this customizability.
Good point @gok99 We could start with a one-stop-config file that supports just enough attributes for an individual to set up their own code portfolio page using RepoSense (that use case was the motivation for this feature anyway). That is, we start with an MVP version of the feature first. What do you think?
That sounds good to me. We could discuss what features to include. Here's a suggestion, based on the current one-stop config fields.
title
: Title of the generated report, which is also the title of the deployed dashboard. Default: "RepoSense Report".repos
: A list of repositories to include for analysis.
repo
: The URL to your repository of interest.groups
: A list of the different custom groupings.
group-name
: Name of the group.globs
: The list of file path globs to include for specified group.branches
: A list of branches to analyse for each repository.branch
: The name of the branch.blurb
: Blurb content in markdown formatignore-glob-list
: Folders/files to ignore, specified using the glob format.ignore-authors-list
: The list of authors to ignore during analysis. Authors should be specified by their Git Author Name.Fields removed:
authors
: Might be sufficient to specify these with ignore-authors-list
file-formats
: Not sure if we might need thisignore-standalone-config
: Will be deprecated soonignore-commits-list
: Seems like the average user would probably not need thisis-shallow-cloning
: Seems like the average user would probably not need thisis-find-previous-authors
: Tied to ignore-commits-list
and probably not needed by the average userAdded a blurb
field at the repo-level. We can enforce that if the one-stop config is present, then blurbs.md
and the CSVs are ignored.
@gok99 @damithc I have updated the YAML file format to reflect the changes proposed in the above comment. As for the logic behind the one-stop config file, I will work on it incrementally over the coming month during my free time to get the application to a point such that, if specified, the one-stop config file takes precedence over the other CSV config files.
If I'm not mistaken, I will most likely have to modify the RunConfigurationDecider
class and any other related classes to implement the config overriding logic, but do feel free to correct me should I be wrong about this!
Fields removed:
authors
: Might be sufficient to specify these withignore-authors-list
@gok99 I'm not sure about this. In this 'personal code portfolio' use case, the user may have used just a few author names when contributing to the repo while the repo might contain commits from hundreds of other authors. Furthermore, new authors might join the repo continuously. So, feels like we should specify which authors to capture rather than which ones to ignore.
Added a
blurb
field at the repo-level. We can enforce that if the one-stop config is present, thenblurbs.md
and the CSVs are ignored.
On this, I'd rather give priority to the blurbs.md
. This is because in our use case the user might want to add extensive and formatted prose explaining their contribution to the repo, which might be troublesome to add to a yaml/json file. Alternatively, we can omit blurb
from the config file entirely -- which means if the user wants to add descriptions to the report, s/he needs to also add a blurbs.md
file.
So, feels like we should specify which authors to capture rather than which ones to ignore
That makes sense! I think I removed it because authors have a number of different (potentially confusing) configuration options, but it seems reasonable to keep them.
On this, I'd rather give priority to the blurbs.md. This is because in our use case the user might want to add extensive and formatted prose explaining their contribution to the repo, which might be troublesome to add to a yaml/json file. Alternatively, we can omit blurb from the config file entirely -- which means if the user wants to add descriptions to the report, s/he needs to also add a blurbs.md file.
I think it would be nice to have the same level of expressivity in the one-stop config as in the blurbs.md
file if we are hoping for the one-stop config to be used for personal portfolios - needing to separate out the blurbs for advanced features would be inconvenient. Parsing (and writing) full markdown embedded in the yaml config would certainly be troublesome though... Maybe we can think of ways to have it not be quite so awkward, or failing that have blurbs.md
take priority as you suggest.
I think it would be nice to have the same level of expressivity in the one-stop config as in the
blurbs.md
file if we are hoping for the one-stop config to be used for personal portfolios - needing to separate out the blurbs for advanced features would be inconvenient. Parsing (and writing) full markdown embedded in the yaml config would certainly be troublesome though... Maybe we can think of ways to have it not be quite so awkward, or failing that haveblurbs.md
take priority as you suggest.
How about this?
That way, the user have the option to use either one but config file takes priority.
That way, the user have the option to use either one but config file takes priority.
Do you mean blurbs.md
takes priority? I think it makes sense to take the blurb from the md file if both are specified, if the md is strictly more expressive.
Do you mean
blurbs.md
takes priority? I think it makes sense to take the blurb from the md file if both are specified, if the md is strictly more expressive.
@gok99 That's fine too. What I don't want is the presence of the one-stop config file causing the blurbs.md to be ignored entirely. As long as I have the option to use the blurbs.md together with the one-stop config file, I'm OK with either of the two files taking priority over the other.
What feature(s) would you like to see in RepoSense
Is the feature request related to a problem?
Currently, the format for configuring the one-stop config file
repo-config.csv
is difficult for inexperienced users to use. Additionally, it is not friendly for those trying to create a code portfolio.If possible, describe the solution
Repurpose
report-config.json
into a one-stop config file in a file format that is more user-friendly. Possible formats includeyml
andxml
. Additionally, include more configuration options (to be decided).This means that the order of priority would be this new file, then
repo-config
, then the rest of thecsv
files.Steps:
yml
and update the parser.Additional context
Expanded version of item 2 of #2157