Closed isedwards closed 2 years ago
Hello, I have just committed a new branch called "cleanup" on gitlab repository. We removed some large files from the project, also removed .env and .key files. The secret key was moved to .env file and it's just referenced inside settings.py. Let me know if there is anything else to fix.
Thank you @JSCesar - it looks like you've managed to clean up a lot of the old files.
@fabiosato, the sensitive data is still in the git history, so it's still possible to go back through the history and see the data, e.g.:
git clone -b cleanup --single-branch https://gitlab.com/fabiosato/surface.git
cd surface
# checkout an earlier commit from before the clean up
git checkout 6a54ee2a
cat proxy/cert.key
Instead of cleaning the history using a tool like BFG Repo-Cleaner should we just start a new git repository and copy the cleaned files to this repo (and upload to GitHub without any of the previous history)?
Hello, @isedwards. Thanks for your reply. @cismoski has cleaned the repository with me and removed a lot of files. Could you inspect the cleanup repository again?
Hi @JSCesar, it looks like you've modified .gitignore
(to ignore those specific files in the future)... but they still exist in the history of changes that were made to the code in the past. If you do git checkout 6a54ee2a
to checkout a version from before the recent changes then you will see all of the large and sensitive files are still stored in the repository.
Also, since everything still exists in the git history, anyone downloading the repository still needs 822 Mb of disc space - the large files will still causes the errors shown above if we try to upload to GitHub.
I think it's okay to upload to GitHub as long as we don't include any of the git history (so we would no longer have git log
for historic changes, only for future changes).
@isedwards if you don't think it's an issue let's copy the cleaned files into the new repository.
I will keep the Gitlab repo around just in case we need to check the history.
Is everyone happy if this first issue (to migrate the code to GitHub) is now closed? The next issue #22 is to make sure NMS of Belize are using the version from GitHub and move all new developments over here.
Let me know if there are any problems, especially relating to how to store the sensitive information that is needed for server deployments - we need to make sure we have a good process in place for when SURFACE is deployed to other organisations and/or the cloud.
The surface development history contains large files that cause the repository size to be approaching 1 Gb.
Before migrating to GitHub, the repo can be pruned using a tool like BFG Repo-Cleaner and decissions can be made on the most appropriate place for large files/binaries.
Several errors are reported when trying to push the entire repo with it's full history to GitHub:
Since the original repository also contains private information that is specific to the Belize installation, the version with large files removed can be uploaded to the opencdms/surface-demo private repository.
This public repository (opencdms/surface) will be used to create a subset that is released as open-source (with the Belize version migrating to the fully open-source version over time).