microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Portal Link to IMG Data #369

Closed ssarrafan closed 3 years ago

ssarrafan commented 3 years ago

Provide hyperlink to same metagenome record in IMG/M

Priority - High Urgency - High

jbeezley commented 3 years ago

We likely need this information added to the schema and the workflow pipelines. Once that is done, it's a quick task to add it to the UI.

ssarrafan commented 3 years ago

@pvangay and @emileyfadrosh since this is high priority let me know if this should get re-assigned to Bill and team or if it's on hold till June.

jeffbaumes commented 3 years ago

We do have the GOLD id if it came from GOLD and could add a link, but for things that did not come from GOLD we wouldn't know what to do.

Eventually we want an explicit external URL field in the schema, but we can do this other quick fix for now if desired.

kfagnan commented 3 years ago

Any of the data from IMG would have links from GOLD.

Suggest adding the external URL field to the schema prior to populating the IMG links unless there's a strong driving use case.

On Mon, May 10, 2021 at 11:52 AM Jeff Baumes @.***> wrote:

We do have the GOLD id if it came from GOLD and could add a link, but for things that did not come from GOLD we wouldn't know what to do.

Eventually we want an explicit external URL field in the schema, but we can do this other quick fix for now if desired.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-server/issues/369#issuecomment-837140463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPGDZ27XFTHWIA7LZLJGTTNATONANCNFSM44FPCXNQ .

pvangay commented 3 years ago

Agree with @kfagnan about pushing for an external URL field (or multiple) - I think this urgency is probably more like a medium although @emileyfadrosh may disagree :)

subdavis commented 3 years ago

Realized today that I don't know how to construct this URL:

Take this for example: https://data.microbiomedata.org/details/sample/gold:Gb0119269

The GOLD ID is Gb0119269

The JGI link is https://gold.jgi.doe.gov/biosample?id=Gb0115840

What should I link to in IMG? Is this still an issue for me, or is this a schema issue to get the link in mongo and ingest it first?

CC @jeffbaumes

ssarrafan commented 3 years ago

Sounds like this won't be done today. I will move this to August.

ssarrafan commented 3 years ago

@subdavis let me know if you need help with this from anyone on the NMDC team

emileyfadrosh commented 3 years ago

@subdavis is waiting on me, sorry! I will get the list of links posted here next week, very sorry for the delay!

emileyfadrosh commented 3 years ago

quick update: I had added a spreadsheet for this other ticket: https://github.com/microbiomedata/NMDC_Planning/issues/83

but this is just a quick-fix, and ideally the IMG taxon_oid would be pulled from the GOLD API. However, we would likely need to put the full url in the schema since GOLD does not maintain the url, only the IMG ID (eg, 3300042813 and not https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=3300042813).

@turbomam @wdduncan @cmungall @dehays @ssarrafan @kfagnan - I am not sure if there is a specific ticket for schema updates for theses links?

wdduncan commented 3 years ago

@emileyfadrosh as we discussed on metadata call we have an alternative identifiers slot that can hold the ids even if the ids are not web resolvable. Does that suffice for now?

wdduncan commented 3 years ago

If more slots are needed, please be sure to post an issue on nmdc-schema issue tracker :)

emileyfadrosh commented 3 years ago

Ok, great, thank you! And sorry, just to clarify: the alternative identifiers can hold multiple records (eg, proposal IDs, IMG IDs, ESS-DIVE IDs)? Thanks @wdduncan

wdduncan commented 3 years ago

@emileyfadrosh Yes, a single record/entity may have multiple alternative identifiers assigned to it. Does that answer your question?

subdavis commented 3 years ago

Update: Because Mongo has igsn identifiers and the spreadsheet has gold identifiers, I can't match most of the records. I can match some of them enough to make the schema changes on our side, so I'm not blocked, but I won't be able to populate the portal with all the correct data until I have IDs to join on.

Since changes are now required in mongo, it seems like it might be best to just go ahead and ingest the data form https://github.com/microbiomedata/NMDC_Planning/issues/83 into mongo since loading this from a spreadsheet on ingest isn't ideal?

dehays commented 3 years ago

@subdavis I'm going to take care of populating study and biosample records in Mongo with the information from Emiley's spreadhsheet from microbiomedata/NMDC_Planning#83 I'll let you know when this has happened and you can retrieve values during Mongo ingest. You do not need to parse additional spreadhsheets outside of your Mongo metadata ingest process.

Most of these will appear in the alternative_identifiers list. I can work with you in how to use the identifiers to link to the other repositories (IMG, INSDC/NCBI, ESS-DIVE, etc.). Relates to #370 and #371 as well

ssarrafan commented 3 years ago

@subdavis I'm going to take care of populating study and biosample records in Mongo with the information from Emiley's spreadhsheet from microbiomedata/NMDC_Planning#83 I'll let you know when this has happened and you can retrieve values during Mongo ingest. You do not need to parse additional spreadhsheets outside of your Mongo metadata ingest process.

Most of these will appear in the alternative_identifiers list. I can work with you in how to use the identifiers to link to the other repositories (IMG, INSDC/NCBI, ESS-DIVE, etc.). Relates to #370 and #371 as well

@dehays and @subdavis I will move this to the September sprint and assign to @dehays. David once you're done with your part can. you let Brandon know and re-assign this to him please? If you prefer to have a separate GH issue let me know.

ssarrafan commented 3 years ago

@dehays any update on this? Is the kitware team still waiting for something from you to proceed?

dehays commented 3 years ago

We have the values for this - so blocked on microbiomedata/nmdc-runtime#23 which I'm told will be available this week.

@subdavis The IMG identifiers that I will add to the alternative_identifiers attribute on study - will look like this: img.taxon:3300009759 The CURIE prefix img.taxon is registered - so you can use this ID to link by using one of the identifier resolution services. For example - you can create your URL like this http://identifiers.org/img.taxon:3300009759 to link to the right page at IMG.

subdavis commented 3 years ago

Excellent, thanks for the update. If this is the case, I think the new IDs will just appear after the new data ingest. If they need to be tweaked I can do that.

ssarrafan commented 3 years ago

@dehays this is part of the change sheets correct? If so, can we close this issue?

ssarrafan commented 3 years ago

Checked with David on this and he said to leave this open until @subdavis has been able to implement the links to IMG from study pages. @subdavis can you let me know when it's done so I can close this or close it please? Thank you

Moving to October sprint for now.

ssarrafan commented 3 years ago

This appears to be done. Let me know if it should be reopened @dehays @subdavis