w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
34 stars 35 forks source link

Define a max size limit for JSON-LD VCs #379

Open OR13 opened 2 years ago

OR13 commented 2 years ago

There must be some recommendation we would make on this front.

mkhraisha commented 2 years ago

this should go to trace interop no?

OR13 commented 2 years ago

We should adopt MongoDB convention, then take some padding and apply this to "Certificate" types and "TraceablePresenations".

OR13 commented 2 years ago

This should happen in the vocabulary, its a data format issue.

OR13 commented 2 years ago

See also: https://betterprogramming.pub/how-to-store-documents-larger-than-16-mb-in-mongodb-aecc957bbe6c

nissimsan commented 2 years ago

Simple google:

16Mb As you know, MongoDB stores data in a document. The limit for one document is 16Mb. You can also use GridFS to store large files that can exceed 16Mb.18 May 2020

OR13 commented 2 years ago

Suggest we set 16 MB as the max credential and presentation size limit.

TallTed commented 2 years ago

TL;DR: We need better justification for taking this action, with clearer presentation of the reasoning behind the limit(s) we're contemplating imposing. Appealing to a debatable "authority" is not sufficient.


I'm wondering why we're imposing one (MongoDB) storage implementation's size limit (which appears not to be absolute, given the comment about GridFS) on VCs and VPs...

This seems specially odd given the likelihood of a CBOR-LD spec to come from the new VCWG. Being a compressed format, CBOR-LD VCs will be able to hold much more data within the same 16MB document size limit than JSON-LD VCs -- and suddenly we've lost the assurance that CBOR-LD VCs can be round-tripped with JSON-LD VCs.

I do not like imposing this arbitrary document size limit, especially because it's based on one implementation's arbitrary (and work-aroundable) limitation. At minimum, I want more justification for imposing this limit on JSON-LD VCs before we do it.

All that said -- This is the Traceability Vocab work item. We are not chartered to impose VC document size limits. Even if we include the Traceability Interop work item, we are still not chartered to impose VC document size limits. Even a recommendation of this sort feels wrong to me, with the current lack of foundational justification.

OR13 commented 2 years ago

See https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

CBOR-LD is not currently used in this document (neither is CBOR).

I don't think document constraints need to be set in stone, but it's wise to test the limits and add safety margin in any engineering system.

TallTed commented 2 years ago

[@OR13] There must be some recommendation we would make on [a max size limit for JSON-LD VCs].

Why "must" there be?

This really doesn't seem to me like a limitation that is necessary nor even desirable at this stage of the game, if ever, and certainly not in a vocabulary.

It might be relevant for traceability-interop, but I'm not convinced there's a need for this recommendation at all.

[@OR13] See https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

That's a long page, of which it appears that two bullets within a single small subsection may be relevant.

(It would be EXTREMELY helpful if you could provide more specific links in cases like this. Linking just to the whole page says that the time you save by not finding and providing the deeper link is more valuable than the cumulative time all your readers must invest in finding the tiny relevant segment of the linked page.)

Those two bullets:

  • Ensure the uploaded file is not larger than a defined maximum file size.
  • If the website supports ZIP file upload, do validation check before unzip the file. The check includes the target path, level of compress, estimated unzip size.

These are not about imposing limits on the size of files, only about common-sense tests relative to users uploading files to a server of some kind, which can help prevent (though not absolutely eliminate) disk and memory overrun.

Sure, people who are deploying atop MongoDB may want or need to impose a 16MB (decompressed?) filesize limit, or at least know what to do when a submitted file exceeds that size (e.g., fall back to GridFS storage) — but these limits are not relevant if deploying atop Virtuoso or various other datastores, so why should these limits be imposed on those deployers?

OR13 commented 2 years ago

FAT file system The File Allocation Table (FAT) file system is the original file system used by MS-DOS and other Windows operating systems. It is a data structure Windows creates when a volume is formatted. This structure stores information about each file and directory so that it can be located later. The maximum disk partition size is 4 GB. On floppy disks, this is limited by the capacity of the disk. The maximum supported file size on hard disks is 2 GB.

FAT32 file system FAT32 stands for File Allocation Table32, an advanced version of the FAT file system. The FAT32 file system supports smaller cluster sizes and larger volumes than the FAT file system, which results in more efficient space allocation. FAT32 file systems support a maximum partition size of 32 GB for Windows XP and Windows Server 2003. The maximum size file size is 4 GB.

NTFS NTFS, which stands for New Technology File System, is an advanced file system that provides performance, security, reliability, and advanced features not found in FAT and FAT32 file systems. Some of the features of NTFS include guaranteed volume consistency by means of transaction logging and recovery techniques. NTFS uses log file and checkpoint information to restore the consistency of the file system. Other advanced features of NTFS include file and folder permissions, compression, encryption, and disk quotas. You cannot use NTFS on floppy disks due to its limited capacity (Sysinternals has a utility for using NTFS on floppy disks. For more information check out Syngress Publishing's Winternals Defragmentation, Recovery, and Administration Field Guide, ISBN 1-59749-079-2). The maximum supported partition size ranges from 2 TB to 16 TB. The maximum file size can be up to 16 TB minus 16 KB. The minimum and maximum partition sizes vary by the partition style chosen when the operating system was installed.

Is the problem that the limit is too small?

or that you think interoperability is achievable without setting limits?

TallTed commented 2 years ago

@OR13 -- You're pasting great big chunks of irrelevant material. That doesn't help further your argument.

It especially doesn't help when the size limits discussed in the irrelevant material you choose to quote are a minimum of 2 GB — 125x the 16 MB size limit you initially proposed imposing on JSON-LD VCs.

Even more, you seem not to have considered the reasons for the limits on the file systems the descriptions of which you quoted — which were originally due to the 16bit (FAT) and later 32bit (FAT32) and 64bit (NTFS) numbers used to implement those systems, which were the largest available on the computer systems originally (or theoretically) meant to be supported by those file systems.

Interop may (but does not always!) require setting limits, on document sizes among other things. However, plucking a document size from all-but-thin-air, based only on one data store implementation's limitation (which doesn't appear to limit the size of the user's stored document, only the size of each "document" used by that implementation to store it at the back end, somewhat like a gzip may be broken up into 100 gz.## files, each ~1/100 of the original gzip file size, in order to store that gzip across a number of floppies when you don't have a suitable HDD or similar), with no further justification nor basis you can apparently state, is not a good way of setting such limits.

OR13 commented 2 years ago

When is the last time you tried signing a 16TB document using RDF dataset normalization?

On Thu, Aug 18, 2022, 9:06 AM Ted Thibodeau Jr @.***> wrote:

@OR13 https://github.com/OR13 -- You're pasting great big chunks of irrelevant material. That doesn't help further your argument.

It especially doesn't help when the size limits discussed in the irrelevant material you choose to quote are a minimum of 2 GB — 125x the 16 MB size limit you initially proposed imposing on JSON-LD VCs.

Even more, you seem not to have considered the reasons for the limits on the file systems the descriptions of which you quoted — which were originally due to the 16bit (FAT) and later 32bit (FAT32) and 64bit (NTFS) numbers used to implement those systems, which were the largest available on the computer systems originally (or theoretically) meant to be supported by those file systems.

Interop may (but does not always!) require setting limits, on document sizes among other things. However, plucking a document size from all-but-thin-air, based only on one data store implementation's limitation (which doesn't appear to limit the size of the user's stored document, only the size of each "document" used by that implementation to store it at the back end, somewhat like a gzip may be broken up into 100 gz.## files, each ~1/100 of the original gzip file size, in order to store that gzip across a number of floppies when you don't have a suitable HDD or similar), with no further justification nor basis you can apparently state, is not a good way of setting such limits.

— Reply to this email directly, view it on GitHub https://github.com/w3c-ccg/traceability-vocab/issues/379#issuecomment-1219538366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JLMA7SJ3NBF6LBS2M2RLVZY7QDANCNFSM5TSPPZLQ . You are receiving this because you were mentioned.Message ID: @.***>

TallTed commented 2 years ago

I don't see the point of your question.

TallTed commented 1 year ago

Guidance is better than restriction, here.

"Keep your Verifiable Credentials as small as possible, and only as large as necessary."

brownoxford commented 1 year ago

@OR13 @mprorock discussed on call, suggest guidance - especially in light of possibly moving away from RDF canonicalization.

OR13 commented 1 year ago
mkhraisha commented 1 year ago

@TallTed says we should have guidance instead of restriction here.

@BenjaminMoe There is a practical limit due to size of RDF canonicalization.

I think a section that says please go as small as possible because of canonicalization times, and outlines best practices on them

TallTed commented 1 year ago

@msporny — You wanted to comment on this.

mprorock commented 1 year ago

@brownoxford on interop a hard max is good idea - very common to do so at the api side

I agree with the 16mb suggested above as safe- that is likely too large for LD + RDF canon - so we will want a much smaller size max there to avoid potential denial of service around verification etc

mprorock commented 1 year ago

@brownoxford i personally think that we should ban RDF processing prior to signature verification (e.g. no LD proofs) in the future for security concerns, but I would like to see where standardization in the vc2.0 working group lands before we give any guidance in this regard.

OR13 commented 1 year ago

I also agree the profile should not endorse RDF processing prior to sign or verify.

I think its fine to do RDF or schema processing after you check the signature, or before you issue a credential, as long as the processing is not "part of the proofing algorithm".

TallTed commented 1 year ago

I also wonder about setting "a max size limit for JSON-LD VCs" in the Traceability Vocab, rather than in the VCDM or VC Data Integrity spec. This just seems the wrong place for it.

OR13 commented 1 year ago

@TallTed I think it would be wise to set a max here: https://github.com/w3c/json-ld-syntax

and then let profiles (like this repo), further restrict the allowed size of conforming documents.