sigven / cacao

Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer
MIT License
21 stars 3 forks source link

Option to include/switch reference data and a global preview table #1

Open skanwal opened 5 years ago

skanwal commented 5 years ago

Hi Sigve

Thanks for another awesome framework. We (@umccr) are very much interested to incorporate this into our reporting.

I have looked at the github repo/code and tested it locally - it works great. We have a couple of questions/comments:

  1. Would it be possible to feed our own reference (bed) files?

We are interested in using some of the reference data from Hartwig. Looking at the codebase, it shouldn't be a problem as the data directory is passed as an argument and this directory contains reference data to be used for the analysis. However, this might impact the annotations that the framework reads in from the .tsv(s) in `cacao_utils.R for specific clinical genomic tracks?

There is an optional flag --target, which according to my understanding refers to the targeted region in the input sample?

  1. Would it make sense to have one global table (that checks coverage for specific genes), stratified by callability - instead of having to go through multiple tracks?

This probably links back to point 1 i.e. feeding in one specific bed track (in this case) which could be joint set of various loci sources such as CIViC, CGI and OncoKb and then reading in the (optional annotations as in the code base) for this data - if this idea aligns well with your original idea of the framework?

  1. It would be useful if we could include an option to limit hereditary cancer - pathogenic loci table to cancer predisposition genes that is also used/referenced here https://github.com/sigven/cpsr?

Sorry about the long commentary and thanks for your time.

Cheers, Sehrish

sigven commented 5 years ago

Dear Sehrish,

Thanks a lot for your input, highly valuable! Generally, I can say that what you suggest makes perfect sense as a further development of the workflow. And parts of your ideas have been mentioned by some other colleagues here. I will get back to you shortly with my ideas/comments on what is realistic short term etc., very busy here today.

PS. You are correct about the --target, this should refer to the targeted region of the input sample. But I have in fact not implemented this one yet, so it is currently only there as a placeholder. Will update that shortly.

regards, Sigve

skanwal commented 5 years ago

Hi Sigve,

Thanks for the response and I look forward to hearing back from you. Happy to coordinate/contribute always.

Regards, Sehrish

sigven commented 5 years ago

Hi Sehrish,

Coming back to this:

Would appreciate your input on this.

regards, Sigve

skanwal commented 5 years ago

Hi Sigve,

Thanks for getting back to this.

• Reference data from Hartwig is:

Also, I do appreciate the point that we need to understand what data we are going to use for presentation as it’s hard processing reference input on the fly.

• Global preview: We are hoping to begin with focussing on coding regions.

It would be definitely very useful to have the ability to switch to whole gene (including introns). But we can expand on this later.

Happy to have your feedback on this and start looking into implementation as well - if this sounds feasible/useful to you.

Regards, Sehrish