textcreationpartnership / Texts

the EEBO TCP texts
Creative Commons Zero v1.0 Universal
32 stars 7 forks source link

Rethink repo organization at Phase 2 #9

Open lb42 opened 4 years ago

lb42 commented 4 years ago

I don't know if anyone else has any appetite for it, but might this not be a good opportunity to rethink the reorganization of the main EEBO-TCP repo on Github? Github itself has changed since this was first set up, and the concern that individual files might be subjected to endless revisions does not seem to have materialised. Now that the public-facing side of EEBO-TCP is more or less static, might it be reorganized as a single github "collection" comprising repos named e.g. A00, A01 etc, each containing the (up to 1000) files with that prefix? There is no limit to the number of files in a repo, but an individual repo cannot contain more than 100 Gb. (and an individual file cannot have more than 100 Mb). Alternatively, taking the first 4 characters of each filename as the repo name, would result in a max file/repo count of only 100,

These counts don't represent the files actually available, of course, just the identifiers in eebodat