Closed kilbergr closed 8 months ago
Ok so the pieces done up to this point are:
redacted/
Reporters.json
Volumes.json
${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
Metadata.json
Cases.jsonl
Volume.pdf
${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
Metadata.json
Cases.jsonl
case/
1.json # file names named after page case starts on; similar to case.law urls
6.json
...
The pieces that remain are
redacted/
${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
Volumes.json
${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
case/
1.html # file names named after page case starts on; similar to case.law urls
6.html
...
vendor/
${volume_id}.tar # compression?
${volume_id}.csv
${volume_id}.tar.sha256
misc/
[stuff from https://case.law/download/]
Given the success of last week's spike, we will continue pursuing creating a static file site pulling from S3. The problem? Our current file structure in S3 is not the one we ultimately want.
We've manually experimented with creating the file structure we expect. Now, we will programmatically experiment with it.
AC:
\n
and escape\
from HTML.