matsengrp / sumrep

Summary statistics for repertoires
16 stars 6 forks source link

Looser coupling to annotaters and simulators? #12

Closed williamdlees closed 5 years ago

williamdlees commented 6 years ago

As we discussed on the recent SW WG call, it might be an idea to provide looser coupling to these underlying tools:

The main motivation for this is that many users may wish to use sumrep methods to analyse data that they have annotated themselves or have obtained pre-annotated. Annotation of live (as opposed to simulated) records from scratch is complex because of the need to take account of the underlying methods, so when handling live records it's probably a good idea for sumrep to start with a pre-annotated set.

I wrote some lightweight support for changeo format that might save a small amount of time: changeo_functions.zip

BrandenOlson commented 6 years ago

Thanks for the input @williamdlees! I'll try to get this integrated as much as possible by the next SW-WG call.

Could you elaborate more on the functions you sent? It looks like they were derived from code that is already in sumrep: https://github.com/matsengrp/sumrep/blob/master/R/IgBlastFunctions.R. Are there particular changes in them that you would like me to incorporate?

williamdlees commented 6 years ago

Thanks Branden, Yes, the files are heavily derived. I just included them as an example of how decoupling could be achieved: it's pretty trivial really, but I had the code anyway for my own purposes so I thought it might be helpful.

BrandenOlson commented 5 years ago

@williamdlees -- I believe I have decoupled these things sufficiently at this point, e.g. renaming annotateSequences to getPartisAnnotations, updating the README, and converting column names to AIRR by default while allowing custom fields from the user. Let me know at your leisure if there is anything I missed or could further clarify/modularize.