rpi-crisis / scraper

Web scrapers for the RCOS project Correcting Rensselaer's Insufferable SIS (CRISIS)
https://rpicrisis.org
3 stars 0 forks source link

[FEATURE] Script to write the version file #18

Open TrevorBrunette opened 2 years ago

TrevorBrunette commented 2 years ago

The version metadata file should be a JSON file containing all of the checksums of the JSON files, this way the files do not need to be redownloaded if only one of them changes. We will use SHA-256 as the hashing algorithm. if there are two files, courses.json and majors.json, then the meta.json file should look like:

{
  "courses": "0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF",
  "majors": "0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF"
}

We need a script to automatically do this: given a predefined list of file names, find the SHA-256 hash for each one if it exists in the working directory, encode it into HEX (64 characters since this is SHA-256) and write it into the meta.json file as described above.

If you decide to use python, consider the following code

hashlib.sha256(bytes).hexdigest()

for smaller files, or

for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)

for larger files.