smangul1 / rop

The Read Origin Protocol (ROP) is a computational protocol that aims to discover the source of all reads, including those originating from repeat sequences, recombinant B and T cell receptors, and microbial communities.
https://github.com/smangul1/rop/wiki
GNU General Public License v3.0
35 stars 14 forks source link

ValueError: invalid literal for int() with base 10: '' #123

Closed sizunj closed 6 years ago

sizunj commented 7 years ago

Not really a problem with ROP (awesome stuff!), but with its dependency cdbfasta. One thing I have noticed when running ROP with a large numbers of reads is that cdb chokes and gives an error when indexing the unmapped fasta file, eg: Error adding cdb record with key 'NS500127:60:H22HNBGX3:2:12101:19949:1865/1'

It seems like cdbfasta has a 4gb memory limit (http://seqanswers.com/forums/showthread.php?t=15331).

I would advise downsampling your reads if they are too large. One simple way to do so its just using samtools view -s

sizunj commented 7 years ago

I just noticed the previous closed issue #103

I believe those errors are caused by cdbfasta running out of memory (problems very reproducible for large bam files, and resolved once I down-sampled). This is an issue with cdbfasta and not ROP itself.

smangul1 commented 7 years ago

Thanks for your feedback. Does allocate more momory solves the problem? I am not sure if down-sampled is a good idea, as you will consider only certain portion of the reads. Thanks, Serghei

sizunj commented 7 years ago

Hi Serghei, I think it is an intrinsic problem with cdbfasta (so its compatible with 32bit OS). Would be great if you might have another way around it. From eyeballing (totally non-quantitative), it appears like the downsampled unmapped reads have similar distribution of found reads.

smangul1 commented 6 years ago

We have a new release with major changes. Please let me know if it works for you so i can close the issue! Thanks, Serghei

smangul1 commented 6 years ago

I am closing the issues. Please let me know if the new release works for you! Serghei