Closed ionox0 closed 4 years ago
There is also the case where the tag is YO:Z:N/A:12345
, with one instance of N/A
.
Does this mean that the previous contig was N/A and the previous position of the read was 12345?
Hi, the YO tag stores the original alignment information and N/A stands for Not Applicable. This means that the read was originally unaligned. If you're using bwa, when one read in a pair is unmapped it is assigned to the mate's locus. The number after N/A is set to the read's original position (i.e mate locus) and is used downstream for sorting.
Thank you @mozack for the information.
Now that I understand the reason behind the N/A for the YO tag, it seems there may be an issue with the YO tag parsing on this line:
https://github.com/mozack/abra2/blob/master/src/main/java/abra/SortedSAMWriter.java#L107
Here is the stack trace from our error:
INFO Thu Apr 04 15:26:51 EDT 2019 PROCESS_REGION_MSECS: 2_158655846_158655966 3676 63 11 0
INFO Thu Apr 04 15:26:55 EDT 2019 PROCESS_REGION_MSECS: 1_11187651_11187891 5148 13 22 0
INFO Thu Apr 04 15:26:55 EDT 2019 PROCESS_REGION_MSECS: 2_61725997_61726117 113 63 14 0
INFO Thu Apr 04 15:26:55 EDT 2019 PROCESS_REGION_MSECS: 1_11188001_11188241 167 10 18 0
INFO Thu Apr 04 15:26:55 EDT 2019 PROCESS_REGION_MSECS: 2_95849218_95849383 115 23 13 0
INFO Thu Apr 04 15:26:55 EDT 2019 PROCESS_REGION_MSECS: 1_156846157_156846397 311 21 20 0
INFO Thu Apr 04 15:26:56 EDT 2019 Clock time in Chromosome: 2_125000001_150000000: 35
INFO Thu Apr 04 15:26:56 EDT 2019 Processing chromosome chunk: 2_175000001_200000000
INFO Thu Apr 04 15:26:56 EDT 2019 Clock time in Chromosome: 1_100000001_125000000: 57
INFO Thu Apr 04 15:26:56 EDT 2019 Processing chromosome chunk: 2_200000001_225000000
java.lang.ArrayIndexOutOfBoundsException: 1
at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:107)
at abra.ReAligner.remapReads(ReAligner.java:757)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:404)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.ArrayIndexOutOfBoundsException: 1
at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:107)
at abra.ReAligner.remapReads(ReAligner.java:757)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:404)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO Thu Apr 04 15:26:56 EDT 2019 PROCESS_REGION_MSECS: 2_178098685_178099045 239 10 21 0
My guess is that there should be a check for N/A
before trying to parse this field, similar to how it is handled here:
Let me know if I can provide more information.
I should note that we are attempting to run Abra two times, in order to find out whether an additional realignment step might give better results. This is possibly not an intended use case
Two runs with the current version of ABRA2 should be OK, although I doubt the results will improve much.
If you're trying to run ABRA2 against output from an older ABRA version, you may run into trouble as the format has changed in the YO tag.
The exception above appears to be during the sort phase. You could try disabling the sort using option:
--nosort
and sort the BAM file after ABRA2 completes.
However, please keep in mind that I have not explicitly tested running ABRA2 against older versions of ABRA.
Thank you very much for the info, we'll be sticking with ABRA2 to keep the YO tag format consistent.
Thank you for your work in developing this tool,
We are wondering about samples which have
YO:Z:N/A
for the YO tag. Is this expected behavior and if so would you be able to provide any detail on the reason for anN/A
value?Thank you!