microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

nmdc-schema needs a more global mechanism for capturing ownership-like credit #2037

Open turbomam opened 4 weeks ago

turbomam commented 4 weeks ago

This is especially relevant to the export NCBI XML task, with contributors @sujaypatil96 @aclum @chienchi etc.

If we are submitting Biosample records to NCBI, we want to indicate the various people and organizations that played a role in the final result: the sequence and metadata about the sample are available in NCBI.

We might say that A obtained the sample, B did the extraction and prepared the library, C operated the sequencer and D uploaded all of that to NCBI.

Challenges:

aclum commented 3 weeks ago

Adding this to this sprint as we need to determine custom attributes for NCBI submission testing.

sujaypatil96 commented 3 weeks ago

Comprehensive example from NCBI (Submission, BioProject, BioSample, SRA): https://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/submit/public-docs/sra/samples/sra.submission.bs.bp.run.xml?view=co

sujaypatil96 commented 3 weeks ago

Mechanism by which we say who owns the Submission:

<Submission>
    <Description>
        <Comment>BP(1.0)+BS(1.0)+SRA</Comment>
        <Organization role="owner" type="institute">
            <Name>Institute of Biology</Name>
            <Contact email="jane.doe@domain.com">
                <Name>
                    <First>jane</First>
                    <Last>doe</Last>
                </Name>
            </Contact>
        </Organization>
        <Hold release_date="2015-10-21"/>
    </Description>
turbomam commented 3 weeks ago

The NMDC schema has has_credit_association for Study and processing_institution for PlannedProcess.

sujaypatil96 commented 3 weeks ago

From the Submission XSD:

<xs:element name="Organization" minOccurs="1" maxOccurs="unbounded" type="typeOrganization">
    <xs:annotation>
        <xs:documentation>
                    Organization(s) involved into the submission.
                    One organization is the owner of submission data which is identified by  attribute "role"
                  </xs:documentation>
    </xs:annotation>
</xs:element>
<xs:attribute name="role" use="optional">
    <xs:annotation>
        <xs:documentation>
          Role of the ogranization in submission - owner of the data or just a participant.
          It is expected that there is one owner of the submission data.
        </xs:documentation>
    </xs:annotation>
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="owner"/>
            <xs:enumeration value="participant"/>
        </xs:restriction>
    </xs:simpleType>
</xs:attribute>
<xs:complexType name="typeOrganization">
turbomam commented 2 weeks ago

@turbomam to contact John at NCBI and check assumptions about the Organization and Submitter nodes in the NCBI submission schema

ssarrafan commented 4 days ago

Appears to be active. Will move to the next sprint. @turbomam please let me know if you're not working on this issue currently.

turbomam commented 1 day ago

Hi Mark, I've moved this conversation over to our standard helpdesk software so we can keep better track of it on our end. You can certainly enter multiple organization nodes in the xml package, but only the ‘owner’ is displayed on public pages currently. It may be possible to display other organization roles on the public pages in specified fields for them, but that would take some development work and our product teams would need to comment on whether and when it can be done.

The Submitter node is auto-populated by submission portal to match the submitting account when the submission is created. The submitting account can make updates to the submission in the future, as well as any users who are a member of the associated shared submission group.

It might help if we had a little more information about how you want NMDC and the data provider to be associated with the submission. Is your goal to maintain editorial ownership of these submissions and simply provide credit for the data provider on the public pages? Or do you want to transfer ownership to the data provider, making them the submitting account? We have an option that allows you to transfer editing privileges to another user after you create the submission. If you are interested in that, let us know and we can discuss it. Best,

The SRA Team

National Center for Biotechnology Information | National Library of Medicine National Institutes of Health, Bethesda, Maryland 20892 Email: SRA@ncbi.nlm.nih.gov

Connect with NLM: Twitter | Facebook | YouTube


  • PLEASE DO NOT MODIFY THE SUBJECT LINE OF THIS EMAIL WHEN RESPONDING TO ENSURE CORRECT TRACKING *

Case Information: Case #: CAS-1333031-N0Y0X1 Customer Name: Alicia Clum Customer Email: aclum@lbl.gov Case Created: 6/27/2024, 2:49:27 PM

Summary: NMDC API submission account

Details: From: Mark Miller [MAM@lbl.gov](mailto:MAM@lbl.gov) Sent: Tuesday, June 25, 2024 1:50 PM To: Alicia Clum [aclum@lbl.gov](mailto:aclum@lbl.gov); Trow, Jonathan (NIH/NLM/NCBI) [C] [jonathan.trow@nih.gov](mailto:jonathan.trow@nih.gov); Sujay Patil [spatil@lbl.gov](mailto:spatil@lbl.gov) Subject: Re: [EXTERNAL] Re: NCBI UIless submission procedures

Hi John. Thanks for helping NMDC think about how to optimally credit people and organizations for the work in NMDC records that we intend to submit to NCBI. Thanks Alicia for starting this conversation.

I've been one of the MIxS developers for a few years now and I would really prefer that NMDC not propagate the use of non-standardized attributes.

Sujay Patil and I have been studying the NCBI submission XSD files to understand how to map NMDC's MIxS-like terminology to tNCBI's expectations.

I suspect that we can express all of the necessary credit in all of the data structures (Biosamples, Bioprojects, SRA) by including multiple nodes. NMDC would take the role of 'participant', not 'owner'. If I could raise an issue, I personally would ask for a richer vocabulary of roles, not a custom attribute approach.

How does that multiple participant organization solution sound?

I am also curious about a tag that I remember as . I got the impression that it is auto-populated with the name or email of the human logged in when the submission is actually submitted. I also got the sense that the node/path never makes it into the biosample_set.xml etc. Can you say more about that? Does either the 'owner' designation or the contents of the node determine who can edit the submission in the future? thanks a lot, Mark