microbiomedata / submission-schema

https://microbiomedata.github.io/submission-schema/
MIT License
1 stars 1 forks source link

jgi-mt data examples #113

Closed brynnz22 closed 1 year ago

brynnz22 commented 1 year ago

@turbomam @mslarae13 I added the last example files for the jgi-mt class (issue #102). I added minimum and exhaustive files in the valid folder. There are also a handful of unexpected_pass examples. Most of them are for slots that should have prefilled values by NMDC. I was not sure if these should be included as unexpected_pass or not, but I included them just in case. The other two files are for the rna_volume slot - it does not invalidate if a too small of too large value is entered.

There are some other things I noted (I assume a lot has to do with the docs needing to be updated):

  1. source_mat_code has an example in the docs that will not actually validate: https://microbiomedata.github.io/submission-schema/source_mat_id/ MPI012345 will not validate because it is missing a colon. Update docs?
  2. Many of the slots are listed as recommended that are in fact required.
  3. Most slots (if not all) including rna_absorb1 and rna_absorb2 are included twice in the JGIMtInterface class list of slots in the documentation: https://microbiomedata.github.io/submission-schema/JgiMtInterface/
turbomam commented 1 year ago

That's great feedback. Hopefully we won't generally have any unexpected pass data files.

Are you saying that the schema doesn't honor rna_volume's comment "Units must be in uL. Enter the numerical part only. Value must 0-1000. Values <25 by special permission only." Like you instantiated one with a volume < 25 uL and there were no consequences? A good way to check this sort of thing by generating local/usage_template.tsv with the project.Makefile I see that there are no slots with a minimum_value set to anything higher than 0. @mslarae13 what behavior do you want here? We should make sure that the comment and the validation rules are harmonious. Maybe we need to say more about what "special permission" is and who to get it from.

Typo? Value must 0-1000

I do remember putting some numeric range constraints on rna_volume and probably dna_volume too. I'll check them now.

  1. I would just say that MIxS doesn't think identifiers should be in the form of CURIES, so they don't require a colon separating the prefix and the local portion. I would say that is an absolute NMDC requirement. Having said that, I believe we are de -prioritizing user-submitted source_mat_id in favor a unique samp_name and an NMDC provided id, which isn't even collected in the DataHarmonizer interface. We should over-write MIxS examples that don't meet our requirements. @sujaypatil96 and I did some of that this morning.
  2. Can you please show where there are contradictions about slot requirement? Please make a new issue.
  3. Yes, I have noticed double-association of several slots to classes in the nmdc-schema and the submissions schema. I have no idea whether it could cause any problems, but I do intend to address that. https://github.com/microbiomedata/issues/issues/249

see also