phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
79 stars 30 forks source link

What library should R developers use for phenopackets? #359

Open cmungall opened 2 years ago

cmungall commented 2 years ago

From:

What strategy should R developers and non python/java developers use?

I know there is an R protobuf library, but I don't think it's an official protobuf/google product:

https://cran.r-project.org/web/packages/RProtoBuf/

pnrobinson commented 2 years ago

AFAIK there is no easy way to work with phenopackets in R. For now we are emphasizing Java and Python, but an R library would be useful. I would think we could explore using the automatically generated C++ library and then possibly Rcpp or a similar approach. But I think first we should figure out what we want to do in R?

julesjacobsen commented 2 years ago

What's wrong with using the library Chris suggested? Sure, it's going to build you a bare-bones model, but that's already a good start for parsing and using the data.

pnrobinson commented 2 years ago

Well, because without support for validation such as we have for Java in https://github.com/phenopackets/phenopacket-tools and will soon have in Python, it is hard to write correct phenopackets. It depends on what one wants to do, but we should try to develop good libraries in every language in which people will work with phenopackets a lot!

cmungall commented 2 years ago

I think it helps to separate use cases here. Broadly these fall into two categories:

For export I agree that we need good library support for validation, but there are a variety of strategies, including services or calling the java or python libraries, or encoding in linkml etc.

However, the helpdesk request that prompted this was for import, so there is no need for a full R validation suite here

pnrobinson commented 2 years ago

What if an invalid phenopacket is imported? This could lead to spurious results, and the analysis of data might not be correct - especially for more complicated phenopackets - unless dedicated software is used rather than generic json. I think we should always validate and if we want to support R we should figure out how to write a library similar to phenopacket tools.