opensourcepledge / opensourcepledge.com

We all depend on Open Source. Pay the maintainers by joining the Open Source Pledge.
https://opensourcepledge.com/
227 stars 33 forks source link

Add report archival #259

Open vladh opened 4 days ago

vladh commented 4 days ago

This addresses but does not close #245.

First of all, here's a summary about how it works, from CONTRIBUTING.md:


We archive member reports so that, in case they become inaccessible, we can (1) refer to the reports, and (2) have a fallback way to display the reports to visitors.

Currently, these reports must be archived manually.

Before archiving reports, make sure you install monolith from your package manager. monolith is used so that we can compactly archive a report into a single file.

To archive all reports, run this from the repository root:

$ ./src/memberData/bin/archiveMembers.ts

This will archive reports using the following directory structure:

archives/
└── reports
    └── sentry
        ├── 2022
        │   ├── 2024-11-08T18:11:29.792Z.html
        │   └── latest.html -> 2024-11-08T18:11:29.792Z.html
        ├── 2023
        │   ├── 2024-11-08T18:11:27.057Z.html
        │   └── latest.html -> 2024-11-08T18:11:27.057Z.html
        └── 2024
            ├── 2024-11-08T18:11:24.601Z.html
            └── latest.html -> 2024-11-08T18:11:24.601Z.html

When a report is archived, the latest.html symlink is updated to point to the latest archived HTML file.

Archives are not currently automatically provided to users eg in case the original URL is inaccessible.


The archiving itself works, but we have two issues.

First of all, there are three members whose reports mysteriously show an error page when viewed as a local file: GitButler, Keygen and Speakeasy. This seems to have something to do with the Javascript on those pages.

Secondly, I did not add the archived .html files because some of them are huge, so we totalled at 274MB for a single round of archiving:

$ du -shc archives/reports/* | sort -h
...
11M archives/reports/keygen
11M archives/reports/vlt
16M archives/reports/chieftools
18M archives/reports/gitbutler
18M archives/reports/sentry
38M archives/reports/rector
93M archives/reports/prefect
274M    total

I looked into this a bit, and it seems that Prefect's website includes many alternate links for the same images, and served PNGs mysteriously cause 10x as much network traffic as the actual PNG file.

This is all to say, let's review/merge this for now and figure out the details of how to run it later. :)