microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

what are the permissible values of the submission portal's status field? what do they mean? #758

Open turbomam opened 1 year ago

turbomam commented 1 year ago

There are submissions in the dev portal with a status value of "complete" but would not pass basic validation. For example, they are missing a sample name

or a globally unique ID

What does complete really mean?

Is there some way to tag the submissions with any rows that don't pass the built-in DH validation?

I was only planning on preparing "complete" submissions for insertion into MongoDB, but that doesn't even seem like a helpful filter at this point. Maybe one can't trust complete submissions before some point in time?

turbomam commented 1 year ago

Of the 50 submissions in the dev environment, I exclude the following before even getting to the invalid rows filter described above

02144204-4461-4fd3-b961-456d0bba99df has no sample data rows
02144204-4461-4fd3-b961-456d0bba99df has status value: in-progress
0239a9cf-4e89-4264-a039-19d6d628c72b has no sample data rows
0239a9cf-4e89-4264-a039-19d6d628c72b has status value: in-progress
086424aa-922e-48c5-a1f7-00e87e48a89c has no sample data rows
0ec2d1c2-66b3-4f0d-b644-dc9965102281 has no sample data rows
0ec2d1c2-66b3-4f0d-b644-dc9965102281 has status value: in-progress
2195af4c-b1f0-4867-97b9-4345b53b0709 has no sample data rows
2195af4c-b1f0-4867-97b9-4345b53b0709 has status value: in-progress
2b6c8d2d-241b-4ff2-adaf-5276f31a5320 has no sample data rows
2b6c8d2d-241b-4ff2-adaf-5276f31a5320 has status value: in-progress
2dc1f1ed-6de3-453e-a933-1789ba9c9933 has no sample data rows
2dc1f1ed-6de3-453e-a933-1789ba9c9933 has status value: in-progress
2ef5a378-0221-4ff8-8ac0-6349e5ff1772 has no sample data rows
2ef5a378-0221-4ff8-8ac0-6349e5ff1772 has status value: in-progress
33d31996-171a-4fdf-b2ea-d3936b649529 has no sample data rows
33d31996-171a-4fdf-b2ea-d3936b649529 has status value: in-progress
39f0767b-102a-4286-a274-b91c4f5dd862 has no sample data rows
39f0767b-102a-4286-a274-b91c4f5dd862 has status value: in-progress
43c007e0-3ab0-4e85-ab04-cea35c5ef604 has no sample data rows
43c007e0-3ab0-4e85-ab04-cea35c5ef604 has status value: in-progress
49e40955-31c7-44a7-9e31-8499335019e6 has no sample data rows
49e40955-31c7-44a7-9e31-8499335019e6 has status value: in-progress
4a068be6-0ede-48a8-b6e7-ae651800bf1e has no sample data rows
4a068be6-0ede-48a8-b6e7-ae651800bf1e has status value: in-progress
4affdd74-591a-4577-ae47-ae5c7aa9507c has no sample data rows
4affdd74-591a-4577-ae47-ae5c7aa9507c has status value: in-progress
4f188ad8-2731-4635-b401-75e079025f47 has no sample data rows
50901421-0ba8-4258-86c4-2c314820af47 has no sample data rows
50901421-0ba8-4258-86c4-2c314820af47 has status value: in-progress
5afd3eb9-25c1-4d95-af4a-debb25aba91a has status value: in-progress
5f2cc536-dcac-48d8-9caf-a1f66c9970b8 has no sample data rows
5f2cc536-dcac-48d8-9caf-a1f66c9970b8 has status value: in-progress
69d5b511-bc00-47bf-9067-81f92900342a has no sample data rows
69d5b511-bc00-47bf-9067-81f92900342a has status value: in-progress
6e0086dd-5038-4d84-a09f-f275e70598fc has no sample data rows
6e0086dd-5038-4d84-a09f-f275e70598fc has status value: in-progress
71e3bbf8-8f4d-427b-9df4-1d3147993aa4 has no sample data rows
71e3bbf8-8f4d-427b-9df4-1d3147993aa4 has status value: in-progress
729b2640-981a-4313-95f2-8e4df97926d2 has no sample data rows
729b2640-981a-4313-95f2-8e4df97926d2 has status value: in-progress
763a273d-73a9-4b3b-b676-37703dfe953b has no sample data rows
763a273d-73a9-4b3b-b676-37703dfe953b has status value: in-progress
781a897a-eb2c-4ab5-aca9-77c8d532092d has no sample data rows
781a897a-eb2c-4ab5-aca9-77c8d532092d has status value: in-progress
822e290d-6837-4956-abb9-996dd5f6d8b9 has no sample data rows
822e290d-6837-4956-abb9-996dd5f6d8b9 has status value: in-progress
82d11d10-ae8c-4dc1-ae70-07dbbe4f76c1 has status value: in-progress
95f509f2-7226-41dc-b205-0f84f154abc0 has no sample data rows
95f509f2-7226-41dc-b205-0f84f154abc0 has status value: in-progress
9746df64-bf2a-4ee3-b004-8ec0f46c0357 has no sample data rows
9746df64-bf2a-4ee3-b004-8ec0f46c0357 has status value: in-progress
9cfa6b9c-c199-4292-bb9c-165ac1a2bc13 has status value: in-progress
af5a9690-3e4d-4023-9d19-571280c0403c has status value: in-progress
afb4a3f7-c005-41bc-af02-0b21743c5948 has no sample data rows
afb4a3f7-c005-41bc-af02-0b21743c5948 has status value: in-progress
b6244868-becd-4308-9d9c-ff68b1fb61cd has no sample data rows
b6244868-becd-4308-9d9c-ff68b1fb61cd has status value: in-progress
c3870c75-5f0b-47da-a9f3-b4e799c79647 has no sample data rows
c3870c75-5f0b-47da-a9f3-b4e799c79647 has status value: in-progress
cc498964-d1da-416d-b353-aecf5f6c749d has no sample data rows
cc498964-d1da-416d-b353-aecf5f6c749d has status value: in-progress
d1fd2285-45d6-48d5-be12-391a6a65af84 has no sample data rows
d1fd2285-45d6-48d5-be12-391a6a65af84 has status value: in-progress
d5f506a2-aa68-4a70-b01a-b5e3e72339d2 has no sample data rows
d5f506a2-aa68-4a70-b01a-b5e3e72339d2 has status value: in-progress
e063c9e9-b1fc-4955-b379-5728c610e140 has status value: in-progress
e73cd45b-e966-4606-b9f4-77fccd36a7b3 has status value: in-progress
e8be7fbc-2c28-4646-bf66-b910e66c96b7 has no sample data rows
e8be7fbc-2c28-4646-bf66-b910e66c96b7 has status value: in-progress
ec48a4fe-7158-476a-a73b-1b076ad0fa73 has no sample data rows
ec48a4fe-7158-476a-a73b-1b076ad0fa73 has status value: in-progress
f731608c-77d7-4ffb-ac6d-2989a7c44e3f has status value: in-progress
fc0a7b70-b813-4c70-99ee-46dedb0f8bfb has no sample data rows
fc0a7b70-b813-4c70-99ee-46dedb0f8bfb has status value: in-progress
ffd7c221-bac6-4401-ad61-864276743c61 has no sample data rows
ffd7c221-bac6-4401-ad61-864276743c61 has status value: in-progress
turbomam commented 1 year ago

There are currently 68 submission in the prod environment. In addition to "no sample data rows" and "has status value: in-progress", this environment also has submissions that appear to be lacing the header rows

000150d8-15ea-473f-ac11-cbef1f66a532 has status value: in-progress
085bd65e-f58e-4573-b412-561784cf06fd doesn't seem to have the expected header rows: [['NMDC:SAMEA7723901', 'qiita_sid_13114:13114.king.27.s002', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723902', 'qiita_sid_13114:13114.king.27.s005', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723903', 'qiita_sid_13114:13114.king.27.s010', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.389305 -155.2485', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]
085bd65e-f58e-4573-b412-561784cf06fd has status value: in-progress
09e9ba17-9cfc-4162-b8d3-988092385c81 has status value: in-progress
0bff12db-8710-4b32-8fdc-f87be4000258 doesn't seem to have the expected header rows: [['NMDC:SAMEA7723901', 'qiita_sid_13114:13114.king.27.s002', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723902', 'qiita_sid_13114:13114.king.27.s005', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723903', 'qiita_sid_13114:13114.king.27.s010', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.389305 -155.2485', '1500 m', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]
0bff12db-8710-4b32-8fdc-f87be4000258 has status value: in-progress
1116bc33-416b-44ee-a7b4-e6ece79165d7 has no sample data rows
1116bc33-416b-44ee-a7b4-e6ece79165d7 has status value: in-progress
1258a6ee-39ee-4677-9b56-2e75b2fb5ac7 has no sample data rows
1258a6ee-39ee-4677-9b56-2e75b2fb5ac7 has status value: in-progress
1cb883d3-21d5-4e1a-9a67-25968c3dfd94 has no sample data rows
1cb883d3-21d5-4e1a-9a67-25968c3dfd94 has status value: in-progress
22950b94-1317-4681-b4ef-d3cde33ed10d has status value: in-progress
23eec707-52b1-4865-b7d5-2fb251a5a4e5 has no sample data rows
23eec707-52b1-4865-b7d5-2fb251a5a4e5 has status value: in-progress
25dc2107-af6f-4755-9a1c-d97fd647db15 has status value: in-progress
2f8e149a-042d-411e-ae66-8edda3e81f68 has no sample data rows
2f8e149a-042d-411e-ae66-8edda3e81f68 has status value: in-progress
3051e96e-1806-42f1-95b7-3be2bf0ee8c5 has status value: in-progress
3102bac7-7c69-4ba3-868a-e84dd15bc552 has no sample data rows
3102bac7-7c69-4ba3-868a-e84dd15bc552 has status value: in-progress
341561af-0d24-4447-8ed3-a40ec0ac8ed4 has no sample data rows
341561af-0d24-4447-8ed3-a40ec0ac8ed4 has status value: in-progress
3533cfa1-ddde-4d08-a349-2e666b0518b1 has status value: in-progress
38c996b1-6281-4c1d-8ebe-996cecdbf7a6 has no sample data rows
38c996b1-6281-4c1d-8ebe-996cecdbf7a6 has status value: in-progress
3a8d507d-8466-45fd-8871-8fda96a48a02 has no sample data rows
3a8d507d-8466-45fd-8871-8fda96a48a02 has status value: in-progress
3ed2dad5-e12b-413f-b172-8374f6515a7f has status value: in-progress
402e24af-2223-4b2f-a372-cccc980c5ed2 has no sample data rows
402e24af-2223-4b2f-a372-cccc980c5ed2 has status value: in-progress
4082fa29-9312-4c1c-af5a-072aec49ab0e has status value: in-progress
47fc50ae-3f27-4bfb-8aa9-58601bd7db02 has no sample data rows
47fc50ae-3f27-4bfb-8aa9-58601bd7db02 has status value: in-progress
4c0da155-ba21-4e46-970b-e85f03b7a728 has status value: in-progress
4d07e024-32fa-45b5-8591-94f03bb1ec5a has status value: in-progress
53da7915-3809-48bd-92b9-42a793b92aa7 has status value: in-progress
53da7915-3809-48bd-92b9-42a793b92aaa has no sample data rows
53da7915-3809-48bd-92b9-42a793b92aaa has status value: in-progress
596c30f7-7af0-47e9-a42b-869eeff21562 has status value: in-progress
59fc6d2f-72a1-446c-b422-70e594c520be has status value: in-progress
5bdb0ca3-d30a-43dd-8a44-2e9e62b75248 doesn't seem to have the expected header rows: [['NMDC:SAMEA7723901', 'qiita_sid_13114:13114.king.27.s002', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '5 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723901DNA', 'qiita_sid_13114:13114.king.27.s002DNA', '100', '30', '1', None, 'NMDC:SAMEA7723901', 'plate', 'B2', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-15', 'USA: Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723902', 'qiita_sid_13114:13114.king.27.s005', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '7 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723902DNA', 'qiita_sid_13114:13114.king.27.s005DNA', '100', '30', '1', None, 'NMDC:SAMEA7723902', 'plate', 'B3', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-15', 'USA: Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723903', 'qiita_sid_13114:13114.king.27.s010', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '2 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723903DNA', 'qiita_sid_13114:13114.king.27.s010DNA', '100', '30', '1', None, 'NMDC:SAMEA7723903', 'plate', 'B4', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2014-04-15', 'USA: Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None]]
5bdb0ca3-d30a-43dd-8a44-2e9e62b75248 has status value: in-progress
602925ac-8fde-4a8c-ac8a-cf9ac6cd2108 has status value: in-progress
62158c29-d75c-417e-a71e-6f910e5445eb has status value: in-progress
64edf9df-a876-4ed4-860e-8e9754db645e has no sample data rows
64edf9df-a876-4ed4-860e-8e9754db645e has status value: in-progress
6b7ce685-51c2-4dab-9ecb-62e450f4d05c has no sample data rows
6b7ce685-51c2-4dab-9ecb-62e450f4d05c has status value: in-progress
6df22157-c7ac-42c1-b2f0-5ed4d2743a1f has no sample data rows
6df22157-c7ac-42c1-b2f0-5ed4d2743a1f has status value: in-progress
6f97a2ca-6a49-41f4-97b6-0ed570cb4857 has no sample data rows
6f97a2ca-6a49-41f4-97b6-0ed570cb4857 has status value: in-progress
79eef611-8036-407d-9b87-c836f19f5879 has status value: in-progress
7a48c118-0459-4b1c-9faa-d1f817cd00e8 has status value: in-progress
7b236d0d-e967-4289-ac50-a5384219ff0e has status value: in-progress
7bd61ddf-a45b-48be-a5a9-bb21b0e474ac has no sample data rows
7bd61ddf-a45b-48be-a5a9-bb21b0e474ac has status value: in-progress
8435dfee-dbad-4085-b149-9025f174d4f8 has status value: in-progress
8bdc7168-32c2-4d75-b789-ed88d51ea21c has no sample data rows
8bdc7168-32c2-4d75-b789-ed88d51ea21c has status value: in-progress
8dc437cc-1d19-4af9-8238-3b24907bc03d has no sample data rows
8dc437cc-1d19-4af9-8238-3b24907bc03d has status value: in-progress
94b6fe9c-8cb7-47e3-a83b-423ec311edc8 has status value: in-progress
95550192-87a5-4215-b373-ee6c8f9a70c0 has status value: in-progress
9615f72f-221c-4622-98cf-8dc5d74db60b has status value: in-progress
9b5d160d-bd08-4d9a-b3aa-5a3f9fa00399 has status value: in-progress
9eccb2b2-ed40-4af0-b3c2-e24ddabaae03 has no sample data rows
9eccb2b2-ed40-4af0-b3c2-e24ddabaae03 has status value: in-progress
a3a8d05a-4f54-41bd-85be-f1c1c014025a has status value: in-progress
a5ef4406-11bb-4299-9efc-dc23eb76704b has no sample data rows
a5ef4406-11bb-4299-9efc-dc23eb76704b has status value: in-progress
a8bf6c5a-47fa-4363-b76f-ab9995319ff1 has no sample data rows
a8bf6c5a-47fa-4363-b76f-ab9995319ff1 has status value: in-progress
ae69b0d7-0b11-4a2b-94a2-8aa3308ea29c has no sample data rows
ae69b0d7-0b11-4a2b-94a2-8aa3308ea29c has status value: in-progress
af0dedf2-39ee-4b38-8ba3-491731be30e8 has status value: in-progress
b0149249-5a99-4d53-8a78-034552036fc6 has status value: in-progress
b76c395d-69bf-4660-85c6-04c44ff8f8c7 has status value: in-progress
bb6bb33f-e3fa-4fc8-a27d-339203257cf7 has status value: in-progress
bcc73d37-380b-4263-841e-5fc13afba65d has no sample data rows
bcc73d37-380b-4263-841e-5fc13afba65d has status value: in-progress
cb3b8798-0a4a-43b2-864f-6e2c2d8fcc0f doesn't seem to have the expected header rows: [['NMDC:SAMEA7723901', 'qiita_sid_13114:13114.king.27.s002', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '5 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723901DNA', 'qiita_sid_13114:13114.king.27.s002DNA', '100', '30', '1', None, 'NMDC:SAMEA7723901', 'plate', 'B2', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2021-04-24', 'USA: Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723902', 'qiita_sid_13114:13114.king.27.s005', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '7 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723902DNA', 'qiita_sid_13114:13114.king.27.s005DNA', '100', '30', '1', None, 'NMDC:SAMEA7723902', 'plate', 'B3', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2021-04-24', 'USA: Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['NMDC:SAMEA7723903', 'qiita_sid_13114:13114.king.27.s010', 'metagenomics;metaproteomics', 'soil', None, 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '2 g', '-80', None, None, '1193011', 'JGI NMDC Test Project', 'SAMEA7723903DNA', 'qiita_sid_13114:13114.king.27.s010DNA', '100', '30', '1', None, 'NMDC:SAMEA7723903', 'plate', 'B4', 'DNAStable', 'yes', None, 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'other', 'frozen', '2021-04-24', 'USA: Hawaii, Kilauea Volcano', '19.389305 -155.2485', '1500 meter', '-80 C', None, None, '0.02', None, None, None, None, None, None, None, None, None, None, None, None, None, None]]
cb3b8798-0a4a-43b2-864f-6e2c2d8fcc0f has status value: in-progress
d26608a6-e28c-4ca6-830a-7be07b48f282 has status value: in-progress
d4fe7b05-7078-4018-b2d6-ee4c9a885ca7 has status value: in-progress
e14b7afb-4b1d-469f-8f94-559c2d4b8f18 has status value: in-progress
ec663de5-1073-4edc-be0a-43284c939ab8 has status value: in-progress
ef79285a-37d1-4af9-9b0d-d045c60f0ec5 has no sample data rows
ef79285a-37d1-4af9-9b0d-d045c60f0ec5 has status value: in-progress
efc5356b-28c9-4450-b710-a91c43ec45e6 has no sample data rows
efc5356b-28c9-4450-b710-a91c43ec45e6 has status value: in-progress
f4ee1751-6c65-448e-b6b0-b3cad93b8473 doesn't seem to have the expected header rows: [['NMDC:SAMEA7723901', 'qiita_sid_13114:13114.king.27.s002', 'metagenomics;metaproteomics', 'soil', '', 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '5 g', '-80', '', '', '1193011', 'JGI NMDC Test Project', 'SAMEA7723901DNA', 'qiita_sid_13114:13114.king.27.s002DNA', '100', '30', '1', '', 'NMDC:SAMEA7723901', 'plate', 'B2', 'DNAStable', 'yes', '', 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', '', '', '0.02', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['NMDC:SAMEA7723902', 'qiita_sid_13114:13114.king.27.s005', 'metagenomics;metaproteomics', 'soil', '', 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '7 g', '-80', '', '', '1193011', 'JGI NMDC Test Project', 'SAMEA7723902DNA', 'qiita_sid_13114:13114.king.27.s005DNA', '100', '30', '1', '', 'NMDC:SAMEA7723902', 'plate', 'B3', 'DNAStable', 'yes', '', 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.388694 -155.249333', '1500 m', '-80 C', '', '', '0.02', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['NMDC:SAMEA7723903', 'qiita_sid_13114:13114.king.27.s010', 'metagenomics;metaproteomics', 'soil', '', 'woodland biome [ENVO:01000175]', 'volcano [ENVO:00000247]', '__volcanic soil [ENVO:01001841]', 'Environmental', 'Terrestrial', 'Soil', 'Biocrust', 'Unclassified', '728472', 'soil', '2 g', '-80', '', '', '1193011', 'JGI NMDC Test Project', 'SAMEA7723903DNA', 'qiita_sid_13114:13114.king.27.s010DNA', '100', '30', '1', '', 'NMDC:SAMEA7723903', 'plate', 'B4', 'DNAStable', 'yes', '', 'forest biome', 'phenol/chloroform extraction', 'Mary Lou', 'John Smith', '1193011', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'other', 'frozen', '2014-04-24', 'USA: State of Hawaii, Kilauea Volcano', '19.389305 -155.2485', '1500 m', '-80 C', '', '', '0.02', '', '', '', '', '', '', '', '', '', '', '', '', '', '']]
f4ee1751-6c65-448e-b6b0-b3cad93b8473 has status value: in-progress
f55b50cd-3366-41e7-a5a5-16f24d2bd95b has status value: in-progress
f6c811bb-4055-4901-840e-cad80d733921 has status value: in-progress
f77af6ca-d790-4dec-8497-146e84debce0 has status value: in-progress
f9882a44-8aa0-40bf-9084-0fdcfe41243e has status value: in-progress
fe9faf49-89f0-436d-958d-c6acb113b19e has status value: in-progress
mcovalt commented 1 year ago

what are the permissible values of the submission portal's status field?

Currently, status can only be one of "in-progress" or "complete".

what do they mean?

A value of "complete" should mean that the submission data is finalized per the user and valid per DataHarmonizer. Our current implementation does not allow the user to indicate "complete" if DataHarmonizer reports invalid data. The "in-progress" status is used for any other state of the data.

Maybe one can't trust complete submissions before some point in time?

I think this is likely the case. I'll look through the Git history to identify the point in time where "complete" didn't require DataHarmonizer validation.

Is there some way to tag the submissions with any rows that don't pass the built-in DH validation?

We could add a boolean is_valid to the submission metadata table, but I'd hope is_valid = status == "complete". Perhaps recording which version of the schema performed the validation is a good idea?