Open reefdog opened 5 years ago
@metasj asked for a more readable list of the problematic files, so here's a Gist with a table of all the files already linked up, along with their timestamp and byte size. Also here's an XLSX and zipped CSV, for good measure.
(Each file path is implicitly rooted at s3://assets.priorartarchive.org/uploads
.)
I don't see the directory names in the gist -- is that all files from all 8 directories, combined? The directory seems like the most important piece for debugging.
Most are Cisco files, by inspection; a few are test uploads.
@metasj The directory names are built into the path name. E.g., 096e402e-b66d-460a-a503-8fc5bd9524f6/01549568245194.pdf
is the file 01549568245194.pdf
within the directory 096e402e-b66d-460a-a503-8fc5bd9524f6
. You can also see the eight directories in the table above; they're the last eight rows, the ones with no corresponding v2.Organization.id
or v1.Company.id
.
(I'll go ahead and edit the Gist so the directory is its own column though, just for clarity!)
Got it, just hard to parse. We should have username for every account. These are perhaps users who didn't set an organization.
24f6: me ed1b: Joel? 7674: Travis? 5d0d: cisco file + title tests 6cfd: 3f1d: 0cf8: travis 378b: cisco test?
The
/uploads
directory of theassets.priorartarchive.org
S3 bucket has directories (19 total) keyed byOrganization.id
.Of these directories, 11 can be linked to organizations, but eight can't. I even checked them against the v1 database, and they don't exist there either.
We should figure out what these are. I've generated a complete recursive list of their contents. (Note that one directory actually contains three more directories, each with only one file.)
We need to sort out what these are.