Closed ThomasThelen closed 5 years ago
Consensus on #165 was that we will no longer support raw identifiers that are not unique. However, using them with something that indicates a provider should still work. Try:
dataId = ["https://search.dataone.org/view/urn:uuid:05d9dab8-5783-498d-a257-2b94da4dbe14"]
This means that anyone trying to register a dataset that doesn't include the full URL won't be able to bring their data in, which would be a big regression of previous behavior. Your example above 100% works, but I think a number of people will be trying to register their data by using an identifier.
Example:
Someone comes into wholetale with an identifier, doi:10.5063/F12805V3
. Their data won't be found unless they use https://search.dataone.org/view/doi:10.5063/F12805V3
instead.
Edit: After chatting with Kacper it looks like my example works in the dashboard. Continuing to investigate....
Just for the sake of tracking examples (maybe edge cases) where the full URI works but the DataONE identifier doesn't.
Yes - https://search.dataone.org/view/ess-dive-77b46fa58849483-20181114T175016467 No - ess-dive-77b46fa58849483-20181114T175016467 No - doi:10.15485/1464233 (doi of the above)
Yes - https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/knb-lter-bnz/363/19 No - https://pasta.lternet.edu/package/metadata/eml/knb-lter-bnz/363/19
Yes - https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/edi/192/3 No - https://pasta.lternet.edu/package/metadata/eml/edi/192/3
If this is the case, @craig-willis, let me know if you want me to update the user documentation.
dataId = ["doi:10.5063/F12805V3"]
should also work.
What about 10.5065/D6862DM8
(without the doi:
protocol)? This worked previously and is in the quickstart documentation, but no longer appears to. Note that https://citation.crosscite.org/ allows this, but of course assumes DOI at all times.
It seems that we should 1) do some validation at the UI end that the provided ID is valid and/or 2) propagate errors from the backend.
doi:10.15485/1464233
This one is tricky: it resolves to https://www.osti.gov/servlets/purl/1464233/
, which in turn returns 302 to https://data.ess-dive.lbl.gov/#view/doi:10.15485/1464233
It's not listed as MN node by CN, how do we know it's DataONE?
This can be located with the DataONE CN resolve endpoint, which is the preferable way of locating resources as opposed to using the member node api.
https://cn.dataone.org/cn/v2/resolve/doi:10.15485/1464233
Resolves to
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-77b46fa58849483-20181114T175016467
We list out the nodes here and have ess-dive listed second to last. I think it's unreliable to try to match the Base URL
parameter with the dataset location. It works in the case of ess-dive, but not with other services (https://doi.pangaea.de/10.1594/PANGAEA.895994 does not share the DataONE Base URL
of https://pangaea-orc-1.dataone.org/mn
`
This can be located with the DataONE CN resolve endpoint, which is the preferable way of locating resources as opposed to using the member node api.
https://cn.dataone.org/cn/v2/resolve/doi:10.15485/1464233
Resolves to
https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-77b46fa58849483-20181114T175016467
Yeah, I see data.ess-dive.lbl.gov, it's pasta.lternet.edu
that's not listed
Just noting that none of these work in current prod env (using old lookup framework):
Others I'll address in PR shortly.
I don't think that pasta.lternet.edu
is a DataONE member node since we list the LTER MN as https://gmn.lternet.edu/mn
.
In the case of https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/knb-lter-bnz/363/19
we're using a URL as the identifier of the resource (we can't assume that this is where the package actually lives), and using resolve we can see that the location of this package is on the LTER MN listed on the node endpoint. This is super confusing because we can also get to the resource by visiting https://pasta.lternet.edu/package/metadata/eml/knb-lter-bnz/363/19
, but AFAIK that page is not using the DataONE API.
The issue is (which we've confirmed in the PR) is that there isn't a common identifier format on the DataONE side; we have no way of telling that a URL is actually an identifier belonging to DataONE. The only way I can see a resolution to this is by sending out a query to the resolve endpoint and seeing if we get a hit.
It looks like after the Globus integration merge the
/repository/lookup
endpoint is mis-behaving.To Reproduce:
dataId:
{"dataId": "urn:uuid:05d9dab8-5783-498d-a257-2b94da4dbe14" }
base_url:https://cn.dataone.org/cn/v2
Error:
Because we can't locate the datasets, data registration is failing. Note that I tested this on dev2 which doesn't have the update and got it to work.