I am finding that some SDs are reported multiple times but with slightly different alignments so they get past the simple duplicate filter. For example:
wc -l final.bed; sort final.bed | uniq | wc -l ; cat final.bed | cut -f 1,2,3,4,5,6 | sort | uniq | wc -l ;
124580 final.bed # the number of lines in the file
124580 # number of unique lines in the file
123909 # number of unique pairs in the file
And you can see the number of lines exceeds the number of unique pairs.
Is this intended (I expect not)? If not is there a good way to filter these?
Hi,
I am finding that some SDs are reported multiple times but with slightly different alignments so they get past the simple duplicate filter. For example:
And you can see the number of lines exceeds the number of unique pairs.
Is this intended (I expect not)? If not is there a good way to filter these?
You can find the results of my run here if that is helpful: https://eichlerlab.gs.washington.edu/help/mvollger/share/sedef/
Thanks! Mitchell