harvard-lil / capstone

CAP database scripts.
MIT License
188 stars 44 forks source link

First foray into manipulating current S3 dir structure into static file structure #2139

Closed kilbergr closed 8 months ago

kilbergr commented 1 year ago

Given the success of last week's spike, we will continue pursuing creating a static file site pulling from S3. The problem? Our current file structure in S3 is not the one we ultimately want.

We've manually experimented with creating the file structure we expect. Now, we will programmatically experiment with it.

AC:

kilbergr commented 1 year ago

Ok so the pieces done up to this point are:

redacted/
    Reporters.json
    Volumes.json
    ${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
        Metadata.json
        Cases.jsonl
        Volume.pdf
        ${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
            Metadata.json
            Cases.jsonl
            case/
                1.json # file names named after page case starts on; similar to case.law urls
                6.json
                ...

The pieces that remain are


redacted/
    ${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
        Volumes.json
        ${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
            case/
                1.html # file names named after page case starts on; similar to case.law urls
                6.html
                ...
            vendor/
                ${volume_id}.tar # compression?
                ${volume_id}.csv
                ${volume_id}.tar.sha256
misc/
    [stuff from https://case.law/download/]