lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
480 stars 133 forks source link

error overlapping PCR intervals - pcrclipreads #44

Open avilella opened 8 years ago

avilella commented 8 years ago

Verify

Describe your issue here.

Your environment

Even though I don't see the overlap between the two regions printed, it seems it's adding some padding (slop) between the two. See below. Any ideas?

java.lang.IllegalStateException: Overlapping PCR intervals : chr1:740716-740765 +       . chr1:740812-740861    +       .
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.findInterval(PcrClipReads.java:69)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.findInterval(PcrClipReads.java:59)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.run(PcrClipReads.java:104)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.call(PcrClipReads.java:191)
        at com.github.lindenb.jvarkit.tools.pcr.AbstractPcrClipReads.call(AbstractPcrClipReads.java:450)
        at com.github.lindenb.jvarkit.tools.pcr.AbstractPcrClipReads.call(AbstractPcrClipReads.java:32)
        at com.github.lindenb.jvarkit.util.command.Command.instanceMainWithExceptions(Command.java:533)
        at com.github.lindenb.jvarkit.util.command.Command.instanceMain(Command.java:570)
        at com.github.lindenb.jvarkit.tools.pcr.PcrClipReads.main(PcrClipReads.java:206)
[main] ERROR jvarkit - Overlapping PCR intervals : chr1:740716-740765   +       . chr1:740812-740861    +       .
[main] ERROR jvarkit - Command failed 
lindenb commented 8 years ago

In fact, it means that a SAMrecord (chrom:start-end) overlaps two regions in your bed > the program doesn't know which bed record, it should use. I don't use this program much, so if you feel that the reads overlapping two regions should be discarded or should use a random interval, just tell me.

P.

avilella commented 8 years ago

I think adding two extra options would cover most of the possibilities:

--5primemost and/or --3primemost

If both flags are given, clip from both sides.

Then another option like so:

--strandmost same or --strandmost opposite

Examples:

 region 1                 region 2
|----------------->|     |-------------->|

              <AAAAA=====BBBBBB| read

--strandmost opposite

AAAAA would be trimmed.

########################################

 region 1                 region 2
|----------------->|     |<--------------|

              <AAAAA=====BBBBBB| read

--strandmost opposite

AAAAA would be trimmed.

########################################

 region 1                 region 2
|<-----------------|     |<--------------|

              <AAAAA=====BBBBBB| read

--strandmost same

AAAAA would be trimmed.

########################################

 region 1                 region 2
|<-----------------|     |-------------->|

              <AAAAA=====BBBBBB| read

--strandmost same

AAAAA would be trimmed.

########################################
lindenb commented 8 years ago

5primemost 3primemost is 'difficult' to implement if a bed region overlap a whole read. As a temporary(?) solution I created the option '-largest' which choose the BED record having the largest overlap with the current read. I don't want to look at the BED strand for now. https://github.com/lindenb/jvarkit/commit/cf1bf46196fda4e95ded4d3bbfcf71df00164ebc

avilella commented 8 years ago

One could limit it to input bed file of non-overlapping regions, which would make the 5primemost and 3primemost possibly easier to implement. Maybe not...

Thanks a lot for all the work, Pierre!

On Thu, Feb 25, 2016 at 1:53 PM, Pierre Lindenbaum <notifications@github.com

wrote:

5primemost 3primemost is 'difficult' to implement if a bed region overlap a whole read. As a temporary(?) solution I created the option '-largest' which choose the BED record having the largest overlap with the current read. I don't want to look at the BED strand for now. cf1bf46 https://github.com/lindenb/jvarkit/commit/cf1bf46196fda4e95ded4d3bbfcf71df00164ebc

— Reply to this email directly or view it on GitHub https://github.com/lindenb/jvarkit/issues/44#issuecomment-188791862.