Open eocampbe opened 4 years ago
Hi @eocampbe ,
This is likely an issue with bloom filter size. I have just now merged a Pull Request submitted by @rsharris which lets you specify the bloom filter size, and this might be useful to you.
In order to do so, please first perform a git pull
to get the latest version of DiscoverY. Subsequently, please see lines 18-20 of discoverY.py, which indicates how to specify bloom filter size using the command line argument "--female_bloom_capacity".
@eocampbe IIRC, You'll want to specify a bloom filter size that is about the expected length of your genome, minus repeats. I.e. to the number of distinct kmers you expect in your input data. The only downside of setting it too high is it will use more memory.
I think the default value was about 3G, which relates to the human genome size (but doesn't adjust downward for repeat content). And the corresponding bloom filter data structure was something like 5G bytes.
Thank you @md5sam and @rsharris, this is very helpful!
The female genome size I'm working with is ~214 mb, so I set that value using the --female_bloom_capacity argument, and it seems to be running now.
Hi again @md5sam and @rsharris,
I am now getting another issue when I try to run discoverY.py. When I use the basic command using either a female bloom filter I created OR the example data provided, like this: python3 ./discoverY.py --mode female+male --female_bloom
I get the following error:
File "./discoverY.py", line 69, in
Any ideas as to what might be causing this?
I'm sorry, that was my mistake.
I'll make a correction to my fork and issue a pull request.
I'm not the owner of this repo, though. So, if you want to get up and running right away, the change will be to add "bf_capacity = None" after line 43 in discoverY.py, so that it looks like this:
if not args['kmer_size']:
k_size = 25
bf_capacity = None
else:
You'd need to be sure to use 8 spaces in front of "bf_capacity", not tab characters.
Great, thanks! I've added that line and it seems to be working now.
Thanks @rsharris, I've now merged your PR.
Hi there,
I am relatively new to python and trying to run discoverY.py in female+male mode using male_contigs.fasta, kmers_from_male_reads, and female reference assembly (female.fasta) files. I am running python 3.7.4,and all the dependencies are installed properly. I created the kmers_from_male_reads file using DSK as per the readme file, and the command I used to run discoverY.py is:
python discoverY.py --mode female+male --kmer_size 25
When I run this, I get this output:
I'm finding it difficult to determine how I might fix this issue. For instance, is the line "Please set bloom filter size before running this program" the source of this error? I can't figure out how I would specify bloom filter size, as there appears to be no option to do so and I can't find any documentation about this in the readme file. Or, is this primarily a memory issue, indicated by the OverflowError? Any help you could give me would be much appreciated!