microbiomedata / pilot

0 stars 1 forks source link

Make datasets indexable via schema.org to google dataset search #52

Open cmungall opened 3 years ago

cmungall commented 3 years ago

This is for the FAIR aim (in aim 1, but requires aim 3 help)

for more on Google dataset search: https://support.google.com/webmasters/thread/1960710

Example of dataset search:

https://datasetsearch.research.google.com/search?query=saline%20lake%20biome&docid=QO%2FFv5GrKn0IrD%2FyAAAAAA%3D%3D

You can see that omics dataset indexes like omicsDI and datamed are indexed, as well as GBIF, datadryad etc - but no specialized metagenome registries

If we mark up HTML served with RDFa/schema.org this will increase the indexability

There is a proposed extension to schema.org for bioschemas.

This would include classes such as study and dataset:

https://github.com/BioSchemas/specifications/issues/472

And a biosample class:

https://github.com/BioSchemas/specifications/issues/323

The biosample proposal was a little human-centric for our purposes, what we have is more of an intersection of a biosample and an environmental sample, I am not sure if there are

elishawc commented 3 years ago

Thanks - Let me know if we need to talk to the group working on schema.org via ESIP. Happy to connect.

On Tue, Dec 15, 2020 at 2:58 PM Chris Mungall notifications@github.com wrote:

for more on Google dataset search: https://support.google.com/webmasters/thread/1960710

If we mark up HTML served with RDFa/schema.org this will increase the indexability

There is a proposed extension to schema.org for bioschemas.

This would include classes such as study and dataset:

BioSchemas/specifications#472 https://github.com/BioSchemas/specifications/issues/472

And a biosample class:

BioSchemas/specifications#323 https://github.com/BioSchemas/specifications/issues/323

The biosample proposal was a little human-centric for our purposes, what we have is more of an intersection of a biosample and an environmental sample, I am not sure if there are

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFUG4S4KTTLOHHT5AVDLYTSU7SZFANCNFSM4U5C4C3A .

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

cmungall commented 3 years ago

Yes, definitely! We can use this ticket for initial discussions and then see if we need to set up a call

I'm aware of bio-efforts to extend schema.org (bioschemas) but not sure if there is an analogous envoschemas

On Tue, Dec 15, 2020 at 5:50 PM Elisha WC notifications@github.com wrote:

Thanks - Let me know if we need to talk to the group working on schema.org via ESIP. Happy to connect.

On Tue, Dec 15, 2020 at 2:58 PM Chris Mungall notifications@github.com wrote:

for more on Google dataset search: https://support.google.com/webmasters/thread/1960710

If we mark up HTML served with RDFa/schema.org this will increase the indexability

There is a proposed extension to schema.org for bioschemas.

This would include classes such as study and dataset:

BioSchemas/specifications#472 https://github.com/BioSchemas/specifications/issues/472

And a biosample class:

BioSchemas/specifications#323 https://github.com/BioSchemas/specifications/issues/323

The biosample proposal was a little human-centric for our purposes, what we have is more of an intersection of a biosample and an environmental sample, I am not sure if there are

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADFUG4S4KTTLOHHT5AVDLYTSU7SZFANCNFSM4U5C4C3A

.

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52#issuecomment-745710684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOM6R4PMYDKDXY5XKK3SVAG6LANCNFSM4U5C4C3A .

elishawc commented 3 years ago

https://github.com/ESIPFed/science-on-schema.org

On Tue, Dec 15, 2020 at 6:10 PM Chris Mungall notifications@github.com wrote:

Yes, definitely! We can use this ticket for initial discussions and then see if we need to set up a call

I'm aware of bio-efforts to extend schema.org (bioschemas) but not sure if there is an analogous envoschemas

On Tue, Dec 15, 2020 at 5:50 PM Elisha WC notifications@github.com wrote:

Thanks - Let me know if we need to talk to the group working on schema.org via ESIP. Happy to connect.

On Tue, Dec 15, 2020 at 2:58 PM Chris Mungall notifications@github.com wrote:

for more on Google dataset search: https://support.google.com/webmasters/thread/1960710

If we mark up HTML served with RDFa/schema.org this will increase the indexability

There is a proposed extension to schema.org for bioschemas.

This would include classes such as study and dataset:

BioSchemas/specifications#472 https://github.com/BioSchemas/specifications/issues/472

And a biosample class:

BioSchemas/specifications#323 https://github.com/BioSchemas/specifications/issues/323

The biosample proposal was a little human-centric for our purposes, what we have is more of an intersection of a biosample and an environmental sample, I am not sure if there are

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ADFUG4S4KTTLOHHT5AVDLYTSU7SZFANCNFSM4U5C4C3A

.

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/microbiomedata/pilot/issues/52#issuecomment-745710684>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAMMOM6R4PMYDKDXY5XKK3SVAG6LANCNFSM4U5C4C3A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52#issuecomment-745716639, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFUG4XID3S7X7KHTGKRQLTSVAJIDANCNFSM4U5C4C3A .

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

elishawc commented 3 years ago

and we dumped some resources here: https://docs.google.com/document/d/1rvEoEw8FU5ewmaiXnLen_Jc1x7nJ2VTAtgyzQfbwyjs/edit

On Tue, Dec 15, 2020 at 6:18 PM Elisha Wood-Charlson < emwood-charlson@lbl.gov> wrote:

https://github.com/ESIPFed/science-on-schema.org

On Tue, Dec 15, 2020 at 6:10 PM Chris Mungall notifications@github.com wrote:

Yes, definitely! We can use this ticket for initial discussions and then see if we need to set up a call

I'm aware of bio-efforts to extend schema.org (bioschemas) but not sure if there is an analogous envoschemas

On Tue, Dec 15, 2020 at 5:50 PM Elisha WC notifications@github.com wrote:

Thanks - Let me know if we need to talk to the group working on schema.org via ESIP. Happy to connect.

On Tue, Dec 15, 2020 at 2:58 PM Chris Mungall <notifications@github.com

wrote:

for more on Google dataset search: https://support.google.com/webmasters/thread/1960710

If we mark up HTML served with RDFa/schema.org this will increase the indexability

There is a proposed extension to schema.org for bioschemas.

This would include classes such as study and dataset:

BioSchemas/specifications#472 https://github.com/BioSchemas/specifications/issues/472

And a biosample class:

BioSchemas/specifications#323 https://github.com/BioSchemas/specifications/issues/323

The biosample proposal was a little human-centric for our purposes, what we have is more of an intersection of a biosample and an environmental sample, I am not sure if there are

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ADFUG4S4KTTLOHHT5AVDLYTSU7SZFANCNFSM4U5C4C3A

.

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/microbiomedata/pilot/issues/52#issuecomment-745710684 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAMMOM6R4PMYDKDXY5XKK3SVAG6LANCNFSM4U5C4C3A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52#issuecomment-745716639, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFUG4XID3S7X7KHTGKRQLTSVAJIDANCNFSM4U5C4C3A .

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

-- Elisha M Wood-Charlson, PhD (she/her) KBase https://kbase.us/ User Engagement Lead; @DOEKBase https://twitter.com/doekbase NMDC http://microbiomedata.org/ Leadership Team; @microbiomedata https://twitter.com/MicrobiomeData Lawrence Berkeley National Laboratory LinkedIn http://www.linkedin.com/in/elishawc, Twitter https://twitter.com/ElishaMariePhD (personal)

wdduncan commented 3 years ago

Not meaning to muddy the waters, but I just ran across Facebook's Open Graph:

https://ogp.me/

It looks like there is a lot of development to do on it. But my first impression is that it can be easily extended.

cmungall commented 3 years ago

Thanks for the science-on-schema link @elishawc! Very useful. This is how Pangaea are doing it:

https://github.com/ESIPFed/science-on-schema.org/issues/27#issuecomment-747414365

kfagnan commented 3 years ago

What's the relative priority of this development vs. getting actual metadata into the system?

On Thu, Dec 17, 2020 at 10:07 AM Chris Mungall notifications@github.com wrote:

Thanks for the science-on-schema link @elishawc https://github.com/elishawc! Very useful. This is how Pangaea are doing it:

ESIPFed/science-on-schema.org#27 (comment) https://github.com/ESIPFed/science-on-schema.org/issues/27#issuecomment-747414365

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/microbiomedata/pilot/issues/52#issuecomment-747605947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPGD5LWFFKFCDB32PLKSLSVJCFZANCNFSM4U5C4C3A .

elishawc commented 3 years ago

Notes from ESIP 2020 winter meeting

Software - JSON-LD, RDF JS

g.co/datasetsearch Take schema.org tables and “clean” by normalizing (fields?), place into knowledge graph (people, places, things, geolocation) and google scholar if can find mentions