tk2 / RetroSeq

RetroSeq is a bioinformatics tool that searches for mobile element insertions from aligned reads in a BAM file and a library of reference transposable elements. Please read the wiki page (link below) for usage instructions. Also, there is a page on the wiki describing how the 1000 genomes CEU trio was carried out with the files and parameters used for the various steps.
64 stars 25 forks source link

Error when trying to run bedtools intersect #18

Closed kutsukos closed 3 years ago

kutsukos commented 6 years ago

When I try to run RetroSeq on some bam format files from 1000 genome project, I get the following error after reading the chromosomes. I use exclude, refTEs and eref (+align) options

..... Reading chromosome: hs37d5 ***** ERROR: too many digits/characters for integer conversion in string 1. Exiting... Failed to run bedtools intersect to filter at retroseq.pl line 367.

I use: -Samtools 0.1.19 -Bedtools 2.27 and on Bedtools 2.26. -Exonerate 2.2.0 w/ glib 2.47.3

I dont get this error when I run RetroSeq with Bedtools 2.17. Do you have any idea why is this happening?

pmille commented 5 years ago

I'm seeing the same error. If you came to a conclusion, would you mind posting it?

kutsukos commented 5 years ago

I didnt have any solution. I stoped trying :D Maybe a solution, can be to remove chromosome: hs37d5 from your data

pmille commented 5 years ago

I couldn't find one either. I just ended up writing a couple python scripts instead of using samtools

tk2 commented 5 years ago

Hi - sincere apologies for my silence on this. I re-ran one of the 1000G samples over the weekend, and couldn't replicate the error. I am running these commands:

perl ~/code/RetroSeq/bin/retroseq.pl -discover -bam CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam -output NA12878.candidates.tab -refTEs ref_types.tab -eref probes.tab -align

and

perl ~/code/RetroSeq/bin/retroseq.pl -call -bam CEUTrio.HiSeq.WGS.b37_decoy.NA12878.clean.dedup.recal.20120117.bam -input NA12878.candidates.tab -output region.vcf -region 1:10000000-11000000 -ref hs37d5.fa

and using Samtools v1.5, exonerate 2.2.0, and bedtools 2.27.1.

Can you maybe post the command that you run?

MatteoSchiavinato commented 5 years ago

The solution is sorting the files properly. If you manually process your files before running bedtools, then you need to sort them again to be sure that no line is out of place.

The "working" sort command would be: sort -dsk1,1 -k2n,2 -k3nr,3

Explanation:

Worked repeatedly for me.

kutsukos commented 5 years ago

Right now, I can not confirm that this is the solution to my problem, cause I do not remember what I was running. Actually, I had the results, from a paper and I wanted to validate them, by executing again the pipeline, that was provided.

sainadfensi commented 3 years ago

Hi, I've just had the same problem. By using bedtools sort -i $file, I find that there is something wrong with the last line of my bed file. After I delete the last line, the problem is solved without sorting the file. But somehow, bedtools sort helps you see where the problem is.

JohnJCole commented 2 years ago

I had this problem and then realised it was because my bed file had a header row that had been sorted to the end of the file. Once I deleted this it worked.

baiwenhuim commented 1 year ago

Yes, I also get this problem. And then I check my bed file, I find the last line is incomplete. So I delete it and run again. It's work.