mskilab-org / fishHook

R package for applying Gamma-Poisson regression to identify statistical enrichment or depletion of somatic mutations in regions after correcting for genomic covariates.
27 stars 9 forks source link

Help to plot 'QQplot' with command 'fish$qqp()' and Coverage problem #8

Open a00101 opened 4 years ago

a00101 commented 4 years ago

1) I ran test data (luad.maf and others). Results are the same in the manual. but 'fish$qqp()' not works.

how should I do that?

2) I have paired sequencing data. how to make coverage file? one way is extracting common and another is extracting all. which is right?

Thanks you for your reply in advance. Thanks.

a00101 commented 4 years ago

my warning message

fish = Fish(hypotheses = genes[, 'gene_name'], events = events, eligible = eligible, idcol = 'Tumor_Sample_Barcode') FishHook 2019-12-06 23:15:20: Tallying events over eligible subsets of hypotheses FishHook 2019-12-06 23:15:20: Overlapping with covered intervals FishHook 2019-12-06 23:15:20: Finished overlapping with covered intervals FishHook 2019-12-06 23:15:21: Applying idcap of 1 on idcol Tumor_Sample_Barcode. FishHook 2019-12-06 23:15:21: Computing event counts FishHook 2019-12-06 23:15:21: Finished counting events FishHook 2019-12-06 23:15:21: Marking eligible vs. non-eligible events FishHook 2019-12-06 23:15:21: Aggregating covariates over eligible subset of hypotheses FishHook 2019-12-06 23:15:21: Overlapping with covered intervals FishHook 2019-12-06 23:15:22: Finished overlapping with covered intervals Warning messages: 1: In gr.findoverlaps(hypotheses, covered, verbose = verbose > 1, max.chunk = max.chunk, : findOverlaps applied to ranges with non-identical seqlengths 2: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 3: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 4: In gr.findoverlaps(events, ov, max.chunk = max.chunk, mc.cores = mc.cores, : findOverlaps applied to ranges with non-identical seqlengths 5: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 6: In gr.findoverlaps(hypotheses, covered, verbose = verbose > 1, max.chunk = max.chunk, : findOverlaps applied to ranges with non-identical seqlengths

mskilab commented 4 years ago

These warnings are fine - they just are letting you know that your ranges have different seqlengths. If the ranges are from the same genome build then no worries. This can happen often when you are using slightly different versions of the same genome build (eg with different alternate contigs), or you didn't initialize your granges with a seqlengths. To get rid of the warnings just make sure that events, eligible, and genes have the same seqlengths (see GenomicRanges package).

Is this happening with the tutorial data? Shouldn't be

On Fri, Dec 6, 2019 at 9:16 AM a00101 notifications@github.com wrote:

my warning message

fish = Fish(hypotheses = genes[, 'gene_name'], events = events, eligible = eligible, idcol = 'Tumor_Sample_Barcode') FishHook 2019-12-06 23:15:20: Tallying events over eligible subsets of hypotheses FishHook 2019-12-06 23:15:20: Overlapping with covered intervals FishHook 2019-12-06 23:15:20: Finished overlapping with covered intervals FishHook 2019-12-06 23:15:21: Applying idcap of 1 on idcol Tumor_Sample_Barcode. FishHook 2019-12-06 23:15:21: Computing event counts FishHook 2019-12-06 23:15:21: Finished counting events FishHook 2019-12-06 23:15:21: Marking eligible vs. non-eligible events FishHook 2019-12-06 23:15:21: Aggregating covariates over eligible subset of hypotheses FishHook 2019-12-06 23:15:21: Overlapping with covered intervals FishHook 2019-12-06 23:15:22: Finished overlapping with covered intervals Warning messages: 1: In gr.findoverlaps(hypotheses, covered, verbose = verbose > 1, max.chunk = max.chunk, : findOverlaps applied to ranges with non-identical seqlengths 2: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 3: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 4: In gr.findoverlaps(events, ov, max.chunk = max.chunk, mc.cores = mc.cores, : findOverlaps applied to ranges with non-identical seqlengths 5: In gr.findoverlaps(query, subject, ...) : findOverlaps applied to ranges with non-identical seqlengths 6: In gr.findoverlaps(hypotheses, covered, verbose = verbose > 1, max.chunk = max.chunk, : findOverlaps applied to ranges with non-identical seqlengths

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mskilab/fishHook/issues/8?email_source=notifications&email_token=AC73JEJSSLCPW4DGTF4FROTQXJNCRA5CNFSM4JWYAXG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGEGQRQ#issuecomment-562587718, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC73JEJXOII7YSEBNNQ47FDQXJNCRANCNFSM4JWYAXGQ .

mskilab commented 4 years ago

What happens when you run fish$qqp()?

You can make a coverage file from a bam file using bedtools or deeptools or Rsamtools any number of tools / packages that output a bedgraph, .wig, bigwig, GRanges, tsv, csv, etc. This is outside of fishHook.

On Fri, Dec 6, 2019 at 9:00 AM a00101 notifications@github.com wrote:

1.

I ran test data (luad.maf and others). Results are the same in the manual. but 'fish$qqp()' not works.

how should I do that?

  1. I have paired sequencing data. how to make coverage file? one way is extracting common and another is extracting all. which is right?

Thanks you for your reply in advance. Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mskilab/fishHook/issues/8?email_source=notifications&email_token=AC73JEOJZE6C42JW67JXWM3QXJLJFA5CNFSM4JWYAXG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H6UJM6A, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC73JEP64IJRPJLTOJ65RZDQXJLJFANCNFSM4JWYAXGQ .

a00101 commented 4 years ago

No. But thanks. My point is "should I make pooled normal coverage?"

mskilab commented 4 years ago

You can use normal coverage to determine the "eligible" territory eg for a custom exome assay or panel.

The other approach is to use the assay manufacturer provided map of the covered territory eg a bed file.

If you are using a custom targeted panel with unknown genomic distribution then you have to compute the eligible yourself via some coverage calculations eg generating per sample base level coverage via deeptools, bedtools, samtools etc and then deciding on a cutoff eg 40X that you require at a base to be in order for a mutation to be detectable in that sample, and then finding all bases that are detectable on average in for example >=90% of samples. These operations can be done via GenomicRanges / rtracklayer packages, and you can pipe the output to fishHook.

On Sun, Dec 8, 2019 at 8:54 AM a00101 notifications@github.com wrote:

No. But thanks. My point is "should I make pooled normal coverage?"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mskilab/fishHook/issues/8?email_source=notifications&email_token=AC73JEI45CMAOUV5WFZNCL3QXT4AHA5CNFSM4JWYAXG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGG7FZI#issuecomment-562950885, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC73JEN7GMI2DSRCB3DUCHTQXT4AHANCNFSM4JWYAXGQ .

a00101 commented 4 years ago

You mean 'convert Normal bam to coverage called eligible' ?

Thanks.

mskilab commented 4 years ago

The eligible territory is just a set of non-overlapping intervals i.e. a mask, a bed file. You can compute it from coverage, e.g. by finding all the regions above a certain coverage in your assay, or all bases that are covered at least 40X in >90% of patients. Whatever you do, in the end, the eligible region should be a set of intervals

On Thu, Dec 12, 2019 at 1:55 AM a00101 notifications@github.com wrote:

You mean 'convert Normal bam to coverage called eligible' ?

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mskilab/fishHook/issues/8?email_source=notifications&email_token=AC73JEIG4BXXP6L7UXEQZWDQYHN4JA5CNFSM4JWYAXG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGVVH6Y#issuecomment-564876283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC73JEJCNNIZGR7RWEC5HDTQYHN4JANCNFSM4JWYAXGQ .

mskilab commented 4 years ago

For WGS the eligible territory can in theory be the whole genome, however there are "bad areas" that are defined by regions of low mappability or erroneous mapping / repetitive sequence where you will routinely see recurrent false positive variants because of alignment errors. We mask these out in hg19 by defining the eligible territory based around Heng Li's mask, which throws out ~650Mbp of genome. Unfortunately no such mask exists for hg38 - though we are working on some alternatives. In the meantime, you can use mappability track from UCSC and define WGS eligible around uniquely mapping regions eg 50mers.

For WES and panels you want to base your eligible territory on the part of the genome that the assay reliably covers - that is more empirical and assay specific / data driven. i.e. you can use the coverage data from your WES / panel to identify bases that are reliably captured (eg in 95% of experiments) .

Hope that makes sense.

On Thu, Dec 19, 2019 at 12:36 AM a00101 notifications@github.com wrote:

Thanks. But still i don't understand exactly for making coverage.

First of all, I know which tool to use to create the coverage.

But what I still don't know is whether I can define the GRCh38 genome file as eligible territory. In other words, should GRCh38 gtf be converted to bed to bigwig?

Or, read coverage varies from sample to sample, so I have to use different coverage files for each sample.

I'm very thank you for your kindness.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mskilab/fishHook/issues/8?email_source=notifications&email_token=AC73JEN2IS77L6XRLM5D543QZMB4NA5CNFSM4JWYAXG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIPD3A#issuecomment-567341548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC73JEM4VORFWBYOHCJMCJDQZMB4NANCNFSM4JWYAXGQ .