The GitHub action Creation of CSV (csv.yml) runs the translations_status.R script that creates the CSV files message_status.csv and metadata.csv. It is scheduled to run every day.
Most days there are no changes in the translations, however if there has been any new commit to the r-devel Subversion repo (and hence, a new commit to the r-svn GitHub mirror of this repository) the entire CSV file changes because the commit ID and date are added to the data.
The GitHub action Main Dashboard Refresh (main_dashboard.yml) re-renders the main dashboard index.Rmd every day, regardless of whether the source data (message_status.csv and metadata.csv) are updated or not.
Therefore large commits are being made on a daily basis that make no real change, bloating the repository as well as using unnecessary compute resources.
We should fix this by:
[ ] Updating translations_status.R to only updating the CSV files if the translations files have actually changed.
This might be done by comparing the latest data to the saved data. I know the *.po file metadata includes po_revision_date - could that be used to check if any *.po files have changed since the date saved in the CSV files? In this case the script could stop after reviewing the metadata.
Or maybe there is a way to review the git history since the commit stored in the CSV files and determine if there has been any change in the *.po files. Then the script could stop even earlier.
[ ] Updating index.Rmd so that re-rendering the main dashboard with the same data only changes the date.
I think passing fixed ID to the group argument of SharedData$new() and the elementId argument of calls to reactable() will avoid random IDs being created in the HTML widgets.
Alternatively, maybe the HTML can be directly edited to update the date when there is no change in the source data? This would avoid re-rendering the dashboard, but maybe is a little hacky.
Another alternative would be to skip updating the HTML if there is no change to the data, but I think it is helpful to have the "last updated" date updated, even when the data hasn't changed, as it shows the data has been checked for changes. Otherwise, if the date is from several days ago you don't know if it's because there have been no updates or because the dashboard is failing to build.
[ ] Put the commit id and date in a separate CSV, so that when there is a change it doesn't change every row of the message_status.csv and metadata.csv files - this will result in smaller and more helpful diffs.
The GitHub action Creation of CSV (
csv.yml
) runs thetranslations_status.R
script that creates the CSV filesmessage_status.csv
andmetadata.csv
. It is scheduled to run every day.Most days there are no changes in the translations, however if there has been any new commit to the r-devel Subversion repo (and hence, a new commit to the r-svn GitHub mirror of this repository) the entire CSV file changes because the commit ID and date are added to the data.
The GitHub action Main Dashboard Refresh (
main_dashboard.yml
) re-renders the main dashboardindex.Rmd
every day, regardless of whether the source data (message_status.csv
andmetadata.csv
) are updated or not.Therefore large commits are being made on a daily basis that make no real change, bloating the repository as well as using unnecessary compute resources.
We should fix this by:
translations_status.R
to only updating the CSV files if the translations files have actually changed.*.po
file metadata includespo_revision_date
- could that be used to check if any*.po
files have changed since the date saved in the CSV files? In this case the script could stop after reviewing the metadata.*.po
files. Then the script could stop even earlier.index.Rmd
so that re-rendering the main dashboard with the same data only changes the date.group
argument ofSharedData$new()
and theelementId
argument of calls toreactable()
will avoid random IDs being created in the HTML widgets.Another alternative would be to skip updating the HTML if there is no change to the data, but I think it is helpful to have the "last updated" date updated, even when the data hasn't changed, as it shows the data has been checked for changes. Otherwise, if the date is from several days ago you don't know if it's because there have been no updates or because the dashboard is failing to build.